Skip to content
Snippets Groups Projects
Verified Commit 3b8c55c0 authored by Daniel Hornung's avatar Daniel Hornung
Browse files

DOC: Some fixes.

parent 96461c8d
No related branches found
No related tags found
2 merge requests!123REL: Release v0.6.0,!112F docs
......@@ -426,11 +426,11 @@ class CaosDBIdentifiableAdapter(IdentifiableAdapter):
# TODO: don't store registered identifiables locally
def __init__(self):
self._registered_identifiables = dict()
self._registered_identifiables = {}
def load_from_yaml_definition(self, path: str):
"""Load identifiables defined in a yaml file"""
with open(path, 'r') as yaml_f:
with open(path, 'r', encoding="utf-8") as yaml_f:
identifiable_data = yaml.safe_load(yaml_f)
for key, value in identifiable_data.items():
......
# Prerequisites
The CaosDB Crawler is a utility to create CaosDB Records from some data
structure, e.g. files, and synchronize this Records with a CaosDB server.
structure, e.g. files, and synchronize these Records with a CaosDB server.
Thus two prerequisites to use the CaosDB Crawler are clear:
1. You need access to a running CaosDB instance. See [documentation](https://docs.indiscale.com/caosdb-deploy/index.html)
1. You need access to a running CaosDB instance. See the [documentation](https://docs.indiscale.com/caosdb-deploy/index.html).
2. You need access to the data that you want to insert, i.e. the files or
the table from which you want to create Records.
Make sure that you configured your Python client to speak
to the correct CaosDB instance (see [configuration docs](https://docs.indiscale.com/caosdb-pylib/configuration.html)).
to the correct CaosDB instance (see [configuration docs](https://docs.indiscale.com/caosdb-pylib/configuration.html)).
We would like to make another prerequisite explicit that is related to the first
point above: You need a data model. Typically, if you want to insert data into
an actively used CaosDB instance, there is already a data model. However, if
there is not yet a data model you can define one using the
[edit mode](https://docs.indiscale.com/caosdb-webui/tutorials/edit_mode.html)
an actively used CaosDB instance, there is a data model already. However, if
there is no data model yet, you can define one using the
[edit mode](https://docs.indiscale.com/caosdb-webui/tutorials/edit_mode.html)
or the [YAML format](https://docs.indiscale.com/caosdb-advanced-user-tools/yaml_interface.html).
We will provide small data models for the examples to come.
Also it is recommended and for the following chapters necessary, that you have
some experience with the CaosDB Python client.
Also it is recommended, and necessary for the following chapters, that you have
some experience with the CaosDB Python client.
If you don't, you can start with
the [tutorials](https://docs.indiscale.com/caosdb-pylib/tutorials/index.html)
If you want to use the
possibility to write CaosDB Crawler configuration files (so called CFoods) it
helps if you know regular expressions. If you don't, don't worry we keep it
simple in this tutorial.
If you want to write CaosDB Crawler configuration files (so called CFoods), it helps if you know
regular expressions. If regular expressions are new to you, don't worry, we keep it simple in this
tutorial.
Tutorial: Parameter File
========================
In the “HelloWorld” Example, the Record, that was synchronized with the
server, was created “manually” using the Python client. Now, we want to
Our data
--------
In the "HelloWorld" Example, the Record, that was synchronized with the
server, was created "manually" using the Python client. Now, we want to
have a look at how the Crawler can be told to do this for us.
The Crawler needs some instructions on what kind of Records it should
create given the data that we provide. This is done using so called
CFood YAML files.
The Crawler needs instructions on what kind of Records it should
create given the data that it sees. This is done using so called
"CFood" YAML files.
Let’s start again with something simple. A common scenario is that we
want to insert the contents of some parameter file. Suppose, the
Let’s once again start with something simple. A common scenario is that we
want to insert the contents of a parameter file. Suppose the
parameter file is named ``params_2022-02-02.json`` and looks like the
following:
.. code:: json
.. code-block:: json
:caption: params_2022-02-02.json
{
"frequency": 0.5,
......@@ -26,7 +30,8 @@ Suppose these are two Properties of an Experiment and the date in the file name
is the date of the Experiment. Thus, the data model could be described in a
``model.yml`` like this:
.. code:: yaml
.. code-block:: yaml
:caption: model.yml
Experiment:
recommended_properties:
......@@ -37,52 +42,61 @@ is the date of the Experiment. Thus, the data model could be described in a
date:
datatype: DATETIME
We will identify experiments solely using the date, so the ``identifiable.yml`` is:
We will identify Experiments solely using the date. Thus the
``identifiable.yml`` is:
.. code-block:: yaml
:caption: identifiable.yml
.. code:: yaml
Experiment:
- date
Experiment:
1. date
Getting started with the CFood
------------------------------
The following section tells the crawler that the key value pair
CFoods (Crawler configurations) can be stored in YAML files:
The following section in a `cfood.yml` tells the crawler that the key value pair
``frequency: 0.5`` shall be used to set the Property "frequency" of an
"Experiment" Record:
.. code:: yaml
frequency: # just the name of this section
...
my_frequency: # just the name of this section
type: FloatElement # it is a float value
match_name: ^frequency$ # regular expression: the key is 'frequency'
match_value: ^(?P<value>.*)$ # regular expression: we match any value
match_name: ^frequency$ # regular expression: Match the 'frequency' key from the data json
match_value: ^(?P<value>.*)$ # regular expression: We match any value of that key
records:
Experiment:
frequency: $value
...
The first part of this section defines, what kind of data element shall be
considered (here: a key value pair with a float value and a key that is
"frequency") and then we use this to set the "frequency" Property.
The first part of this section defines which kind of data element shall be handled (here: a
key-value pair with key "frequency" and a float value) and then we use this to set the "frequency"
Property.
How does it work that we actually assign the value? Let's look at what the
How does it work to actually assign the value? Let's look at what the
regular expressions do:
- ``^frequency$`` assures that the key is exactly "frequency". "^" matches the
beginning of the string and "$" the end.
- ``^(?P<value>.*)$`` creates a match group with the name "value" and the
beginning of the string and "$" matches the end.
- ``^(?P<value>.*)$`` creates a *named match group* with the name "value" and the
pattern of this group is ".*". The dot matches any character and the star means
that it can occur zero, one or multiple times. Thus, this regular expression
matches anything and puts it in the group with the name value.
that it can occur zero, one or multiple times. Thus, this regular expression
matches anything and puts it in a group with the name ``value``.
We can use the groups from the regular expressions that are used for matching.
Here, we use the "value" group to assign the "frequency" value to the "Experiment".
In our example, we use the "value" group to assign the "frequency" value to the "Experiment".
A fully grown CFood
-------------------
Since we will not pass this key value pair on its own to the crawler, we need
to embed it into its context. The full CFood file for
to embed it into its context. The full CFood file ``cfood.yml`` for
this example might look like the following:
.. code:: yaml
.. code-block:: yaml
:caption: cfood.yml
---
metadata:
......@@ -94,7 +108,7 @@ this example might look like the following:
subtree:
parameterfile: # corresponds to our parameter file
type: JSONFile
match: params_(?P<date>\d+-\d+-\d+)\.json # this is the naming pattern of the parameter file
match: params_(?P<date>\d+-\d+-\d+)\.json # extract the date from the parameter file
records:
Experiment: # one Experiment is associated with the file
date: $date # the date is taken from the file name
......@@ -103,7 +117,7 @@ this example might look like the following:
type: Dict
match: .* # the dictionary does not have a meaningful name
subtree:
frequency: # here we parse the frequency...
my_frequency: # here we parse the frequency...
type: FloatElement
match_name: frequency
match_value: (?P<val>.*)
......@@ -127,8 +141,8 @@ to write this in a more condensed way!
For now, we want to see it running!
The crawler can then be run with the following command (assuming that
the parameter file lies in the current working directory):
The crawler can now be run with the following command (assuming that
the CFood file is in the current working directory):
.. code:: sh
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment