diff --git a/src/doc/index.rst b/src/doc/index.rst index fdb99d4d9a6cb8bf6972d7ee22489f362436bb90..4cf6fd8c6e68874e2b4bb1a604c1c07b2cac2659 100644 --- a/src/doc/index.rst +++ b/src/doc/index.rst @@ -9,6 +9,7 @@ CaosDB-Crawler Documentation Getting started<getting_started/index> Tutorials<tutorials/index> + Workflow<workflow> Concepts<concepts> Converters<converters/index> CFoods (Crawler Definitions)<cfood> diff --git a/src/doc/workflow.rst b/src/doc/workflow.rst new file mode 100644 index 0000000000000000000000000000000000000000..b8d48f1ae299431e6aeaf8a173a9e9ffbc0388f2 --- /dev/null +++ b/src/doc/workflow.rst @@ -0,0 +1,65 @@ +Crawler Workflow +================ + +The LinkAhead crawler aims to provide a very flexible framework for synchronizing +data on file systems (or potentially other sources of information) with a +running LinkAhead instance. The workflow that is used in the scientific environment +should be choosen according to the users needs. It is also possible to combine +multiple workflow or use them in parallel. + +In this document we will describe several workflows for crawler operation. + +Local Crawler Operation +----------------------- + +A very simple setup that can also reliably be used for testing +sets up the crawler on a local computer. The files that +are being crawled need to be visible to both, the locally running crawler and +the LinkAhead server. + +Prerequisites ++++++++++++++ + +- Make sure that LinkAhead is running, that your computer has a network connection to LinkAhead and + that your pycaosdb.ini is pointing to the correct instance of LinkAhead. Please refer to the + pylib manual for questions related to the configuration in pycaosdb.ini + (https://docs.indiscale.com/caosdb-pylib/README_SETUP.html). +- Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip). +- Make sure that you have created: + - The data model, needed for the crawler. + - A file "identifiables.yml" describing the identifiables. + - A cfood file, e.g. cfood.yml. + +Running the crawler ++++++++++++++++++++ + +Running the crawler currently involves two steps: +- Inserting the files +- Running the crawler program + +Inserting the files +))))))))))))))))))) + +This can be done using the module "loadFiles" from caosadvancedtools. +(See https://docs.indiscale.com/caosdb-advanced-user-tools/ for installation.) + +The generic syntax is: + +python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder> + +Important: The <path-to-crawled-folder> is the location of the files **as seen by LinkAhead**, e.g. for a LinkAhead instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like: + +python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData + +This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the LinkAhead-prefix "/" which is the root prefix. + +Running the crawler +))))))))))))))))))) + +The following command line assumes that the extroot folder visible in the LinkAhead docker container is located in "../extroot": + +caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/ + +Server Side Crawler Operation +----------------------- +To be filled.