Skip to content
Snippets Groups Projects
Commit 3b9e12d9 authored by Florian Spreckelsen's avatar Florian Spreckelsen
Browse files

DOC: Add screenshots of created records

parent 49826f91
No related branches found
No related tags found
2 merge requests!178FIX: #96 Better error output for crawl.py script.,!163F dict heuristic
......@@ -31,20 +31,20 @@ The yaml definition may look like this:
.. code-block:: yaml
<NodeName>:
type: <ConverterName>
match: ".*"
records:
Experiment1:
parents:
- Experiment
- Blablabla
date: $DATUM
(...)
Experiment2:
parents:
- Experiment
subtree:
(...)
type: <ConverterName>
match: ".*"
records:
Experiment1:
parents:
- Experiment
- Blablabla
date: $DATUM
(...)
Experiment2:
parents:
- Experiment
subtree:
(...)
The **<NodeName>** is a description of what the current block represents (e.g.
``experiment-folder``) and is used as an identifier.
......@@ -76,35 +76,35 @@ applied to the respective variables when the converter is executed.
.. code-block:: yaml
<NodeName>:
type: <ConverterName>
match: ".*"
transform:
<TransformNodeName>:
in: $<in_var_name>
out: $<out_var_name>
functions:
- <func_name>: # name of the function to be applied
<func_arg1>: <func_arg1_value> # key value pairs that are passed as parameters
<func_arg2>: <func_arg2_value>
# ...
type: <ConverterName>
match: ".*"
transform:
<TransformNodeName>:
in: $<in_var_name>
out: $<out_var_name>
functions:
- <func_name>: # name of the function to be applied
<func_arg1>: <func_arg1_value> # key value pairs that are passed as parameters
<func_arg2>: <func_arg2_value>
# ...
An example that splits the variable ``a`` and puts the generated list in ``b`` is the following:
.. code-block:: yaml
Experiment:
type: Dict
match: ".*"
transform:
param_split:
in: $a
out: $b
functions:
- split: # split is a function that is defined by default
marker: "|" # its only parameter is the marker that is used to split the string
records:
Report:
tags: $b
type: Dict
match: ".*"
transform:
param_split:
in: $a
out: $b
functions:
- split: # split is a function that is defined by default
marker: "|" # its only parameter is the marker that is used to split the string
records:
Report:
tags: $b
This splits the string in '$a' and stores the resulting list in '$b'. This is here used to add a
list valued property to the Report Record.
......@@ -218,21 +218,21 @@ Example:
type: CSVTableConverter
match: ^test_table.csv$
records:
(...) # Records edited for the whole table file
(...) # Records edited for the whole table file
subtree:
ROW: # Any name for a data row in the table
type: DictElement
match_name: .*
match_value: .*
records:
(...) # Records edited for each row
subtree:
COLUMN: # Any name for a specific type of column in the table
type: FloatElement
match_name: measurement # Name of the column in the table file
match_value: (?P<column_value).*)
records:
(...) # Records edited for each cell
ROW: # Any name for a data row in the table
type: DictElement
match_name: .*
match_value: .*
records:
(...) # Records edited for each row
subtree:
COLUMN: # Any name for a specific type of column in the table
type: FloatElement
match_name: measurement # Name of the column in the table file
match_value: (?P<column_value).*)
records:
(...) # Records edited for each cell
XLSXTableConverter
......@@ -252,10 +252,10 @@ The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is
a specialization of the
:py:class:`~caoscrawler.converters.DictElementConverter` and offers
all its functionality. It is meant to operate on dictionaries (e.g.,
from reading in a json or a table file) the keys of which correspond
closely to LinkAhead properties. This is especially handy in cases
where properties may be added to the data model and data sources after
the writing of the CFood definition.
from reading in a json or a table file), the keys of which correspond
closely to properties in a LinkAhead datamodel. This is especially
handy in cases where properties may be added to the data model and
data sources that are not yet known when writing the cfood definition.
The converter definition of the
:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an
......@@ -273,12 +273,12 @@ look at a simple example. A CFood definition
type: PropertiesFromDictElement
match: ".*"
record_from_dict:
variable_name: MyRec
variable_name: MyRec
parents:
- MyType1
- MyType2
applied on a dictionary
applied to a dictionary
.. code-block:: json
......@@ -295,11 +295,15 @@ will create a Record ``New name`` with parents ``MyType1`` and
``MyType2``. It has a scalar property ``a`` with value 5, a list
property ``b`` with values "a", "b" and "c", and an ``author``
property which references an ``author`` with a ``full_name`` property
with value "Silvia Scientist". Note how the different dictionary keys
are handled differently depending on their types: scalar and list
values are understood automatically, and a dictionary-valued entry
like ``author`` is translated into a reference to an ``author`` Record
automatically.
with value "Silvia Scientist":
.. image:: img/properties-from-dict-records-author.png
:height: 210
Note how the different dictionary keys are handled differently
depending on their types: scalar and list values are understood
automatically, and a dictionary-valued entry like ``author`` is
translated into a reference to an ``author`` Record automatically.
You can further specify how references are treated with an optional
``references key`` in ``record_from_dict``. Let's assume that in the
......@@ -314,18 +318,21 @@ extending the above example definition by
type: PropertiesFromDictElement
match: ".*"
record_from_dict:
variable_name: MyRec
variable_name: MyRec
parents:
- MyType1
- MyType2
references:
author:
parents:
parents:
- Person
so that now, a ``Person`` record with a ``full_name`` property with
value "Silvia Scientist" is created as the value of the ``author``
property.
property:
.. image:: img/properties-from-dict-records-person.png
:height: 200
Properties can be blacklisted with the ``properties_blacklist``
keyword. Since the
......@@ -343,13 +350,16 @@ For further customization, the
used as a basis for :ref:`custom converters<Custom Converters>` which
can make use of its ``referenced_record_callback`` argument. The
``referenced_record_callback`` can be a callable object which takes
exactly one Record as an argument and needs to return that Record
after doing whatever custom treatment is needed. It is applied to all
Records that are created from the dictionary and it can be used to,
e.g., transform values of some properties, or add special treatment to
all Records of a specific type. ``referenced_record_callback`` is
applied **after** the properties from the dictionary have been applied
as explained above.
exactly a Record as an argument and needs to return that Record after
doing whatever custom treatment is needed. Additionally, it is given
the ``RecordStore`` and the ``ValueStore`` in order to be able to
access the records and values that have already been defined from
within ``referenced_record_callback``. It is applied to all Records
that are created from the dictionary and it can be used to, e.g.,
transform values of some properties, or add special treatment to all
Records of a specific type. ``referenced_record_callback`` is applied
**after** the properties from the dictionary have been applied as
explained above.
Further converters
++++++++++++++++++
......@@ -399,7 +409,7 @@ datamodel like
H5Ndarray:
obligatory_properties:
internal_hdf5-path:
datatype: TEXT
datatype: TEXT
although the names of both property and record type can be configured within the
cfood definition.
......@@ -513,11 +523,11 @@ First we will create our package and module structure, which might be:
tox.ini
src/
scifolder/
__init__.py
converters/
__init__.py
sources.py # <- the actual file containing
# the converter class
__init__.py
converters/
__init__.py
sources.py # <- the actual file containing
# the converter class
doc/
unittests/
......@@ -542,74 +552,74 @@ that would be given using a yaml definition (see next section below).
"""
def __init__(self, definition: dict, name: str,
converter_registry: dict):
"""
Initialize a new directory converter.
"""
super().__init__(definition, name, converter_registry)
converter_registry: dict):
"""
Initialize a new directory converter.
"""
super().__init__(definition, name, converter_registry)
def create_children(self, generalStore: GeneralStore,
element: StructureElement):
element: StructureElement):
# The source resolver does not create children:
# The source resolver does not create children:
return []
return []
def create_records(self, values: GeneralStore,
records: RecordStore,
element: StructureElement,
file_path_prefix):
if not isinstance(element, TextElement):
raise RuntimeError()
# This function must return a list containing tuples, each one for a modified
# property: (name_of_entity, name_of_property)
keys_modified = []
# This is the name of the entity where the source is going to be attached:
attach_to_scientific_activity = self.definition["scientific_activity"]
rec = records[attach_to_scientific_activity]
# The "source" is a path to a source project, so it should have the form:
# /<Category>/<project>/<scientific_activity>/
# obtain these information from the structure element:
val = element.value
regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
'/(?P<project_date>.*?)_(?P<project_identifier>.*)'
'/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
res = re.match(regexp, val)
if res is None:
raise RuntimeError("Source cannot be parsed correctly.")
# Mapping of categories on the file system to corresponding record types in CaosDB:
cat_map = {
"SimulationData": "Simulation",
"ExperimentalData": "Experiment",
"DataAnalysis": "DataAnalysis"}
linkrt = cat_map[res.group("category")]
keys_modified.extend(create_records(values, records, {
"Project": {
"date": res.group("project_date"),
"identifier": res.group("project_identifier"),
},
linkrt: {
"date": res.group("date"),
"identifier": res.group("identifier"),
"project": "$Project"
},
attach_to_scientific_activity: {
"sources": "+$" + linkrt
}}, file_path_prefix))
# Process the records section of the yaml definition:
keys_modified.extend(
super().create_records(values, records, element, file_path_prefix))
# The create_records function must return the modified keys to make it compatible
# to the crawler functions:
return keys_modified
records: RecordStore,
element: StructureElement,
file_path_prefix):
if not isinstance(element, TextElement):
raise RuntimeError()
# This function must return a list containing tuples, each one for a modified
# property: (name_of_entity, name_of_property)
keys_modified = []
# This is the name of the entity where the source is going to be attached:
attach_to_scientific_activity = self.definition["scientific_activity"]
rec = records[attach_to_scientific_activity]
# The "source" is a path to a source project, so it should have the form:
# /<Category>/<project>/<scientific_activity>/
# obtain these information from the structure element:
val = element.value
regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
'/(?P<project_date>.*?)_(?P<project_identifier>.*)'
'/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
res = re.match(regexp, val)
if res is None:
raise RuntimeError("Source cannot be parsed correctly.")
# Mapping of categories on the file system to corresponding record types in CaosDB:
cat_map = {
"SimulationData": "Simulation",
"ExperimentalData": "Experiment",
"DataAnalysis": "DataAnalysis"}
linkrt = cat_map[res.group("category")]
keys_modified.extend(create_records(values, records, {
"Project": {
"date": res.group("project_date"),
"identifier": res.group("project_identifier"),
},
linkrt: {
"date": res.group("date"),
"identifier": res.group("identifier"),
"project": "$Project"
},
attach_to_scientific_activity: {
"sources": "+$" + linkrt
}}, file_path_prefix))
# Process the records section of the yaml definition:
keys_modified.extend(
super().create_records(values, records, element, file_path_prefix))
# The create_records function must return the modified keys to make it compatible
# to the crawler functions:
return keys_modified
If the recommended (python) package structure is used, the package containing the converter
......@@ -636,8 +646,8 @@ function signature:
.. code-block:: python
def create_records(values: GeneralStore, # <- pass the current variables store here
records: RecordStore, # <- pass the current store of CaosDB records here
def_records: dict): # <- This is the actual definition of new records!
records: RecordStore, # <- pass the current store of CaosDB records here
def_records: dict): # <- This is the actual definition of new records!
`def_records` is the actual definition of new records according to the yaml cfood specification
......@@ -653,7 +663,7 @@ Let's have a look at a few examples:
match: (?P<dir_name>.*)
records:
Experiment:
identifier: $dir_name
identifier: $dir_name
This block will just create a new record with parent `Experiment` and one property
`identifier` with a value derived from the matching regular expression.
......@@ -671,7 +681,7 @@ Let's formulate that using `create_records`:
}
keys_modified = create_records(values, records,
record_def)
record_def)
The `dir_name` is set explicitely here, everything else is identical to the yaml statements.
......@@ -694,9 +704,9 @@ So, a sketch of a typical implementation within a custom converter could look li
.. code-block:: python
def create_records(self, values: GeneralStore,
records: RecordStore,
element: StructureElement,
file_path_prefix: str):
records: RecordStore,
element: StructureElement,
file_path_prefix: str):
# Modify some records:
record_def = {
......@@ -704,15 +714,15 @@ So, a sketch of a typical implementation within a custom converter could look li
}
keys_modified = create_records(values, records,
record_def)
record_def)
# You can of course do it multiple times:
keys_modified.extend(create_records(values, records,
record_def))
record_def))
# You can also process the records section of the yaml definition:
keys_modified.extend(
super().create_records(values, records, element, file_path_prefix))
super().create_records(values, records, element, file_path_prefix))
# This essentially allows users of your converter to customize the creation of records
# by providing a custom "records" section additionally to the modifications provided
# in this implementation of the Converter.
......@@ -733,12 +743,12 @@ Let's have a look at a more complex examples, defining multiple records:
match: (?P<dir_name>.*)
records:
Project:
identifier: project_name
identifier: project_name
Experiment:
identifier: $dir_name
Project: $Project
identifier: $dir_name
Project: $Project
ProjectGroup:
projects: +$Project
projects: +$Project
This block will create two new Records:
......@@ -771,7 +781,7 @@ Let's formulate that using `create_records` (again, `dir_name` is constant here)
}
keys_modified = create_records(values, records,
record_def)
record_def)
Debugging
=========
......@@ -787,7 +797,7 @@ output for the match step. The following snippet illustrates this:
debug_match: True
records:
Project:
identifier: project_name
identifier: project_name
Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against
......
src/doc/img/properties-from-dict-records-author.png

59.2 KiB

src/doc/img/properties-from-dict-records-person.png

56.2 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment