From fadfde5bcd25a52058ed309bc02bdc11adc117c9 Mon Sep 17 00:00:00 2001
From: Alexander Schlemmer <alexander@mail-schlemmer.de>
Date: Fri, 27 Jan 2023 12:32:21 +0100
Subject: [PATCH 1/3] DOC: new document describing the typical crawler workflow

---
 src/doc/workflow.rst | 60 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 src/doc/workflow.rst

diff --git a/src/doc/workflow.rst b/src/doc/workflow.rst
new file mode 100644
index 00000000..0ffd50ec
--- /dev/null
+++ b/src/doc/workflow.rst
@@ -0,0 +1,60 @@
+Crawler Workflow
+================
+
+The CaosDB crawler aims to provide a very flexible framework for synchronizing
+data on file systems (or potentially other sources of information) with a
+running CaosDB instance. The workflow that is used in the scientific environment
+should be choosen according to the users needs. It is also possible to combine multiple workflow or use them in parallel.
+
+In this document we will describe several workflows for crawler operation.
+
+Local Crawler Operation
+-----------------------
+
+A very simple setup that can also reliably used for testing (e.g. in local
+docker containers) sets up the crawler on a local computer. The files that
+are being crawled need to be visible to both, the local computer and the
+machine, running the CaosDB.
+
+Prerequisites
++++++++++++++
+
+- Make sure that CaosDB is running, that your computer has a network connection to CaosDB and
+  that your pycaosdb.ini is pointing to the correct instance of CaosDB. Please refer to the
+  pylib manual for questions related to the configuration in pycaosdb.ini
+  (https://docs.indiscale.com/caosdb-pylib/README_SETUP.html).
+- Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip).
+- Make sure that you have created:
+  - The data model, needed for the crawler.
+  - A file "identifiables.yml" describing the identifiables.
+  - A cfood file, e.g. cfood.yml.
+
+Running the crawler
++++++++++++++++++++
+
+Running the crawler currently involves two steps:
+- Inserting the files
+- Running the crawler program
+
+Inserting the files
+)))))))))))))))))))
+
+This can be done using the module "loadFiles" from caosadvancedtools.
+(See https://docs.indiscale.com/caosdb-advanced-user-tools/ for installation.)
+
+The generic syntax is:
+
+python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder>
+
+Important: The <path-to-crawled-folder> is the location of the files **as seen by CaosDB**, e.g. for a CaosDB instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
+
+python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData
+
+This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the CaosDB-prefix "/" which is the root prefix.
+
+Running the crawler
+)))))))))))))))))))
+
+The following command line assumes that the extroot folder visible in the CaosDB docker container is located in "../extroot":
+
+caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
-- 
GitLab


From 650087410670d43994bf3b8fc2b3ca1b4d576770 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henrik=20tom=20W=C3=B6rden?= <h.tomwoerden@indiscale.com>
Date: Fri, 21 Mar 2025 09:18:22 +0100
Subject: [PATCH 2/3] DOC: include workflow documentation

---
 src/doc/index.rst    |  1 +
 src/doc/workflow.rst | 16 ++++++++--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/doc/index.rst b/src/doc/index.rst
index fdb99d4d..4cf6fd8c 100644
--- a/src/doc/index.rst
+++ b/src/doc/index.rst
@@ -9,6 +9,7 @@ CaosDB-Crawler Documentation
 
    Getting started<getting_started/index>
    Tutorials<tutorials/index>
+   Workflow<workflow>
    Concepts<concepts>
    Converters<converters/index>
    CFoods (Crawler Definitions)<cfood>
diff --git a/src/doc/workflow.rst b/src/doc/workflow.rst
index 0ffd50ec..9116f85e 100644
--- a/src/doc/workflow.rst
+++ b/src/doc/workflow.rst
@@ -1,9 +1,9 @@
 Crawler Workflow
 ================
 
-The CaosDB crawler aims to provide a very flexible framework for synchronizing
+The LinkAhead crawler aims to provide a very flexible framework for synchronizing
 data on file systems (or potentially other sources of information) with a
-running CaosDB instance. The workflow that is used in the scientific environment
+running LinkAhead instance. The workflow that is used in the scientific environment
 should be choosen according to the users needs. It is also possible to combine multiple workflow or use them in parallel.
 
 In this document we will describe several workflows for crawler operation.
@@ -14,13 +14,13 @@ Local Crawler Operation
 A very simple setup that can also reliably used for testing (e.g. in local
 docker containers) sets up the crawler on a local computer. The files that
 are being crawled need to be visible to both, the local computer and the
-machine, running the CaosDB.
+machine, running the LinkAhead.
 
 Prerequisites
 +++++++++++++
 
-- Make sure that CaosDB is running, that your computer has a network connection to CaosDB and
-  that your pycaosdb.ini is pointing to the correct instance of CaosDB. Please refer to the
+- Make sure that LinkAhead is running, that your computer has a network connection to LinkAhead and
+  that your pycaosdb.ini is pointing to the correct instance of LinkAhead. Please refer to the
   pylib manual for questions related to the configuration in pycaosdb.ini
   (https://docs.indiscale.com/caosdb-pylib/README_SETUP.html).
 - Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip).
@@ -46,15 +46,15 @@ The generic syntax is:
 
 python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder>
 
-Important: The <path-to-crawled-folder> is the location of the files **as seen by CaosDB**, e.g. for a CaosDB instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
+Important: The <path-to-crawled-folder> is the location of the files **as seen by LinkAhead**, e.g. for a LinkAhead instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
 
 python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData
 
-This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the CaosDB-prefix "/" which is the root prefix.
+This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the LinkAhead-prefix "/" which is the root prefix.
 
 Running the crawler
 )))))))))))))))))))
 
-The following command line assumes that the extroot folder visible in the CaosDB docker container is located in "../extroot":
+The following command line assumes that the extroot folder visible in the LinkAhead docker container is located in "../extroot":
 
 caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
-- 
GitLab


From b3669164a7ca8b55220bc83c5335284022223d2e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henrik=20tom=20W=C3=B6rden?= <h.tomwoerden@indiscale.com>
Date: Fri, 21 Mar 2025 09:22:53 +0100
Subject: [PATCH 3/3] DOC: minor rephrasing

---
 src/doc/workflow.rst | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/doc/workflow.rst b/src/doc/workflow.rst
index 9116f85e..b8d48f1a 100644
--- a/src/doc/workflow.rst
+++ b/src/doc/workflow.rst
@@ -4,17 +4,18 @@ Crawler Workflow
 The LinkAhead crawler aims to provide a very flexible framework for synchronizing
 data on file systems (or potentially other sources of information) with a
 running LinkAhead instance. The workflow that is used in the scientific environment
-should be choosen according to the users needs. It is also possible to combine multiple workflow or use them in parallel.
+should be choosen according to the users needs. It is also possible to combine
+multiple workflow or use them in parallel.
 
 In this document we will describe several workflows for crawler operation.
 
 Local Crawler Operation
 -----------------------
 
-A very simple setup that can also reliably used for testing (e.g. in local
-docker containers) sets up the crawler on a local computer. The files that
-are being crawled need to be visible to both, the local computer and the
-machine, running the LinkAhead.
+A very simple setup that can also reliably be used for testing
+sets up the crawler on a local computer. The files that
+are being crawled need to be visible to both, the locally running crawler and
+the LinkAhead server.
 
 Prerequisites
 +++++++++++++
@@ -58,3 +59,7 @@ Running the crawler
 The following command line assumes that the extroot folder visible in the LinkAhead docker container is located in "../extroot":
 
 caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
+
+Server Side Crawler Operation
+-----------------------
+To be filled.
-- 
GitLab