From fadfde5bcd25a52058ed309bc02bdc11adc117c9 Mon Sep 17 00:00:00 2001
From: Alexander Schlemmer <alexander@mail-schlemmer.de>
Date: Fri, 27 Jan 2023 12:32:21 +0100
Subject: [PATCH] DOC: new document describing the typical crawler workflow

---
 src/doc/workflow.rst | 60 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 src/doc/workflow.rst

diff --git a/src/doc/workflow.rst b/src/doc/workflow.rst
new file mode 100644
index 00000000..0ffd50ec
--- /dev/null
+++ b/src/doc/workflow.rst
@@ -0,0 +1,60 @@
+Crawler Workflow
+================
+
+The CaosDB crawler aims to provide a very flexible framework for synchronizing
+data on file systems (or potentially other sources of information) with a
+running CaosDB instance. The workflow that is used in the scientific environment
+should be choosen according to the users needs. It is also possible to combine multiple workflow or use them in parallel.
+
+In this document we will describe several workflows for crawler operation.
+
+Local Crawler Operation
+-----------------------
+
+A very simple setup that can also reliably used for testing (e.g. in local
+docker containers) sets up the crawler on a local computer. The files that
+are being crawled need to be visible to both, the local computer and the
+machine, running the CaosDB.
+
+Prerequisites
++++++++++++++
+
+- Make sure that CaosDB is running, that your computer has a network connection to CaosDB and
+  that your pycaosdb.ini is pointing to the correct instance of CaosDB. Please refer to the
+  pylib manual for questions related to the configuration in pycaosdb.ini
+  (https://docs.indiscale.com/caosdb-pylib/README_SETUP.html).
+- Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip).
+- Make sure that you have created:
+  - The data model, needed for the crawler.
+  - A file "identifiables.yml" describing the identifiables.
+  - A cfood file, e.g. cfood.yml.
+
+Running the crawler
++++++++++++++++++++
+
+Running the crawler currently involves two steps:
+- Inserting the files
+- Running the crawler program
+
+Inserting the files
+)))))))))))))))))))
+
+This can be done using the module "loadFiles" from caosadvancedtools.
+(See https://docs.indiscale.com/caosdb-advanced-user-tools/ for installation.)
+
+The generic syntax is:
+
+python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder>
+
+Important: The <path-to-crawled-folder> is the location of the files **as seen by CaosDB**, e.g. for a CaosDB instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
+
+python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData
+
+This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the CaosDB-prefix "/" which is the root prefix.
+
+Running the crawler
+)))))))))))))))))))
+
+The following command line assumes that the extroot folder visible in the CaosDB docker container is located in "../extroot":
+
+caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
-- 
GitLab