Merge branch 'f-doc-workflow' into 'dev'

F doc workflow See merge request !220

Merge branch 'f-doc-workflow' into 'dev'
ae46260b · Henrik tom Wörden · 177fe1db · b3669164 · ae46260b · ae46260b
Commit ae46260b authored 5 months ago by Henrik tom Wörden
--- a/src/doc/index.rst
+++ b/src/doc/index.rst
@@ -9,6 +9,7 @@ CaosDB-Crawler Documentation

   Getting started<getting_started/index>
   Tutorials<tutorials/index>
+   Workflow<workflow>
   Concepts<concepts>
   Converters<converters/index>
   CFoods (Crawler Definitions)<cfood>

--- a/src/doc/workflow.rst
+++ b/src/doc/workflow.rst
+Crawler Workflow
+================
+
+The LinkAhead crawler aims to provide a very flexible framework for synchronizing
+data on file systems (or potentially other sources of information) with a
+running LinkAhead instance. The workflow that is used in the scientific environment
+should be choosen according to the users needs. It is also possible to combine
+multiple workflow or use them in parallel.
+
+In this document we will describe several workflows for crawler operation.
+
+Local Crawler Operation
+-----------------------
+
+A very simple setup that can also reliably be used for testing
+sets up the crawler on a local computer. The files that
+are being crawled need to be visible to both, the locally running crawler and
+the LinkAhead server.
+
+Prerequisites
+++++++++++++
+
+- Make sure that LinkAhead is running, that your computer has a network connection to LinkAhead and
+  that your pycaosdb.ini is pointing to the correct instance of LinkAhead. Please refer to the
+  pylib manual for questions related to the configuration in pycaosdb.ini
+  (https://docs.indiscale.com/caosdb-pylib/README_SETUP.html).
+- Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip).
+- Make sure that you have created:
+  - The data model, needed for the crawler.
+  - A file "identifiables.yml" describing the identifiables.
+  - A cfood file, e.g. cfood.yml.
+
+Running the crawler
+++++++++++++++++++
+
+Running the crawler currently involves two steps:
+- Inserting the files
+- Running the crawler program
+
+Inserting the files
+)))))))))))))))))))
+
+This can be done using the module "loadFiles" from caosadvancedtools.
+(See https://docs.indiscale.com/caosdb-advanced-user-tools/ for installation.)
+
+The generic syntax is:
+
+python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder>
+
+Important: The <path-to-crawled-folder> is the location of the files **as seen by LinkAhead**, e.g. for a LinkAhead instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
+
+python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData
+
+This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the LinkAhead-prefix "/" which is the root prefix.
+
+Running the crawler
+)))))))))))))))))))
+
+The following command line assumes that the extroot folder visible in the LinkAhead docker container is located in "../extroot":
+
+caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
+
+Server Side Crawler Operation
+-----------------------
+To be filled.