Skip to content
Snippets Groups Projects
Commit ae46260b authored by Henrik tom Wörden's avatar Henrik tom Wörden
Browse files

Merge branch 'f-doc-workflow' into 'dev'

F doc workflow

See merge request !220
parents 177fe1db b3669164
Branches
Tags
2 merge requests!222Release 0.12.0,!220F doc workflow
Pipeline #62117 passed
......@@ -9,6 +9,7 @@ CaosDB-Crawler Documentation
Getting started<getting_started/index>
Tutorials<tutorials/index>
Workflow<workflow>
Concepts<concepts>
Converters<converters/index>
CFoods (Crawler Definitions)<cfood>
......
Crawler Workflow
================
The LinkAhead crawler aims to provide a very flexible framework for synchronizing
data on file systems (or potentially other sources of information) with a
running LinkAhead instance. The workflow that is used in the scientific environment
should be choosen according to the users needs. It is also possible to combine
multiple workflow or use them in parallel.
In this document we will describe several workflows for crawler operation.
Local Crawler Operation
-----------------------
A very simple setup that can also reliably be used for testing
sets up the crawler on a local computer. The files that
are being crawled need to be visible to both, the locally running crawler and
the LinkAhead server.
Prerequisites
+++++++++++++
- Make sure that LinkAhead is running, that your computer has a network connection to LinkAhead and
that your pycaosdb.ini is pointing to the correct instance of LinkAhead. Please refer to the
pylib manual for questions related to the configuration in pycaosdb.ini
(https://docs.indiscale.com/caosdb-pylib/README_SETUP.html).
- Make sure that caosdb-crawler and caosdb-advanced-user-tools are installed (e.g. using pip).
- Make sure that you have created:
- The data model, needed for the crawler.
- A file "identifiables.yml" describing the identifiables.
- A cfood file, e.g. cfood.yml.
Running the crawler
+++++++++++++++++++
Running the crawler currently involves two steps:
- Inserting the files
- Running the crawler program
Inserting the files
)))))))))))))))))))
This can be done using the module "loadFiles" from caosadvancedtools.
(See https://docs.indiscale.com/caosdb-advanced-user-tools/ for installation.)
The generic syntax is:
python3 -m caosadvancedtools.loadFiles -p <prefix-in-caosdb-file-system> <path-to-crawled-folder>
Important: The <path-to-crawled-folder> is the location of the files **as seen by LinkAhead**, e.g. for a LinkAhead instance running in a docker container (e.g. see: https://gitlab.com/caosdb/caosdb-docker) the command line could look like:
python3 -m caosadvancedtools.loadFiles -p / /opt/caosdb/mnt/extroot/ExperimentalData
This command line would load the folder "ExperimentalData" contained in the extroot folder within the docker container to the LinkAhead-prefix "/" which is the root prefix.
Running the crawler
)))))))))))))))))))
The following command line assumes that the extroot folder visible in the LinkAhead docker container is located in "../extroot":
caosdb-crawler -i identifiables.yml --prefix /extroot --debug --provenance=provenance.yml -s update cfood.yml ../extroot/ExperimentalData/
Server Side Crawler Operation
-----------------------
To be filled.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment