Skip to content
Snippets Groups Projects
Commit d48ca0f9 authored by Alexander Schlemmer's avatar Alexander Schlemmer
Browse files

DOC: added a more comprehensive example cfood for the docs

parent 9ed646f4
No related branches found
No related tags found
2 merge requests!91Release 0.3,!90Example cfood in the tutorials section
Pipeline #33093 passed
Example CFood
=============
Let's walk through an example cfood that makes use of a simple directory structure. We assume
the structure to have the following form, starting from the file crawler root:
.. code-block::
ExperimentalData/
2022_TestData/
2022-02-17_TestDataset/
file1.dat
file2.dat
...
...
2023_AnotherDataFolder/
...
...
This file structure conforms to the one described in our article "Guidelines for a Standardized Filesystem Layout for Scientific Data" (https://doi.org/10.3390/data5020043). As a simplified example
we want to write a crawler that creates "Project" and "Measurement" records in CaosDB and set
some reasonable properties stemming from the file and directory names. Furthermore, we want
to link the ficticious dat files to the Measurement records.
Let's first clarify the terms we are using:
.. code-block::
ExperimentalData/ <--- Category level (level 0)
2022_TestData/ <--- Project level (level 1)
2022-02-17_TestDataset/ <--- Activity / Measurement level (level 2)
file1.dat <--- Files on level 3
file2.dat
...
...
2023_AnotherDataFolder/ <--- Project level (level 1)
...
...
So we can see, that the three-level folder structure, described in the paper is replicated.
We are using the term "Activity level" here, instead of the terms used in the article, as
it can be used in a more general way.
The following yaml cfood is able to match and insert / update the records accordingly:
.. code-block:: yaml
ExperimentalData: # Converter for the category level
type: Directory
match: ^ExperimentalData$ # The name of the matched folder is given here!
subtree:
project_dir: # Converter for the project level
type: Directory
match: (?P<date>.*?)_(?P<identifier>.*)
records:
Project:
parents:
- Project
date: $date
identifier: $identifier
subtree:
measurement: # Converter for the activity / measurement level
type: Directory
match: (?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?
records:
Measurement:
date: $date
identifier: $identifier
project: $Project
subtree:
datFile: # Converter for the files
type: SimpleFile
match: ^(.*)\.dat$ # The file extension is matched using a regular expression.
records:
datFileRecord:
role: File
path: $datFile
file: $datFile
Measurement:
output: +$datFileRecord
Here, we provide a detailled explanation of the specific parts of the yaml definition:
.. image:: example_crawler.svg
This diff is collapsed.
Tutorials
+++++++++
.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:
Example CFood<example>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment