diff --git a/src/doc/converters.rst b/src/doc/converters.rst index 723fdb6687741c8cb30afa1333ce5c10ce1e8bd0..a955007ce43122c141f703c7d35021431db4859e 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -31,20 +31,20 @@ The yaml definition may look like this: .. code-block:: yaml <NodeName>: - type: <ConverterName> - match: ".*" - records: - Experiment1: - parents: - - Experiment - - Blablabla - date: $DATUM - (...) - Experiment2: - parents: - - Experiment - subtree: - (...) + type: <ConverterName> + match: ".*" + records: + Experiment1: + parents: + - Experiment + - Blablabla + date: $DATUM + (...) + Experiment2: + parents: + - Experiment + subtree: + (...) The **<NodeName>** is a description of what the current block represents (e.g. ``experiment-folder``) and is used as an identifier. @@ -76,35 +76,35 @@ applied to the respective variables when the converter is executed. .. code-block:: yaml <NodeName>: - type: <ConverterName> - match: ".*" - transform: - <TransformNodeName>: - in: $<in_var_name> - out: $<out_var_name> - functions: - - <func_name>: # name of the function to be applied - <func_arg1>: <func_arg1_value> # key value pairs that are passed as parameters - <func_arg2>: <func_arg2_value> - # ... + type: <ConverterName> + match: ".*" + transform: + <TransformNodeName>: + in: $<in_var_name> + out: $<out_var_name> + functions: + - <func_name>: # name of the function to be applied + <func_arg1>: <func_arg1_value> # key value pairs that are passed as parameters + <func_arg2>: <func_arg2_value> + # ... An example that splits the variable ``a`` and puts the generated list in ``b`` is the following: .. code-block:: yaml Experiment: - type: Dict - match: ".*" - transform: - param_split: - in: $a - out: $b - functions: - - split: # split is a function that is defined by default - marker: "|" # its only parameter is the marker that is used to split the string - records: - Report: - tags: $b + type: Dict + match: ".*" + transform: + param_split: + in: $a + out: $b + functions: + - split: # split is a function that is defined by default + marker: "|" # its only parameter is the marker that is used to split the string + records: + Report: + tags: $b This splits the string in '$a' and stores the resulting list in '$b'. This is here used to add a list valued property to the Report Record. @@ -218,21 +218,21 @@ Example: type: CSVTableConverter match: ^test_table.csv$ records: - (...) # Records edited for the whole table file + (...) # Records edited for the whole table file subtree: - ROW: # Any name for a data row in the table - type: DictElement - match_name: .* - match_value: .* - records: - (...) # Records edited for each row - subtree: - COLUMN: # Any name for a specific type of column in the table - type: FloatElement - match_name: measurement # Name of the column in the table file - match_value: (?P<column_value).*) - records: - (...) # Records edited for each cell + ROW: # Any name for a data row in the table + type: DictElement + match_name: .* + match_value: .* + records: + (...) # Records edited for each row + subtree: + COLUMN: # Any name for a specific type of column in the table + type: FloatElement + match_name: measurement # Name of the column in the table file + match_value: (?P<column_value).*) + records: + (...) # Records edited for each cell XLSXTableConverter @@ -252,10 +252,10 @@ The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is a specialization of the :py:class:`~caoscrawler.converters.DictElementConverter` and offers all its functionality. It is meant to operate on dictionaries (e.g., -from reading in a json or a table file) the keys of which correspond -closely to LinkAhead properties. This is especially handy in cases -where properties may be added to the data model and data sources after -the writing of the CFood definition. +from reading in a json or a table file), the keys of which correspond +closely to properties in a LinkAhead datamodel. This is especially +handy in cases where properties may be added to the data model and +data sources that are not yet known when writing the cfood definition. The converter definition of the :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an @@ -273,12 +273,12 @@ look at a simple example. A CFood definition type: PropertiesFromDictElement match: ".*" record_from_dict: - variable_name: MyRec + variable_name: MyRec parents: - MyType1 - MyType2 -applied on a dictionary +applied to a dictionary .. code-block:: json @@ -295,11 +295,15 @@ will create a Record ``New name`` with parents ``MyType1`` and ``MyType2``. It has a scalar property ``a`` with value 5, a list property ``b`` with values "a", "b" and "c", and an ``author`` property which references an ``author`` with a ``full_name`` property -with value "Silvia Scientist". Note how the different dictionary keys -are handled differently depending on their types: scalar and list -values are understood automatically, and a dictionary-valued entry -like ``author`` is translated into a reference to an ``author`` Record -automatically. +with value "Silvia Scientist": + +.. image:: img/properties-from-dict-records-author.png + :height: 210 + +Note how the different dictionary keys are handled differently +depending on their types: scalar and list values are understood +automatically, and a dictionary-valued entry like ``author`` is +translated into a reference to an ``author`` Record automatically. You can further specify how references are treated with an optional ``references key`` in ``record_from_dict``. Let's assume that in the @@ -314,18 +318,21 @@ extending the above example definition by type: PropertiesFromDictElement match: ".*" record_from_dict: - variable_name: MyRec + variable_name: MyRec parents: - MyType1 - MyType2 references: author: - parents: + parents: - Person so that now, a ``Person`` record with a ``full_name`` property with value "Silvia Scientist" is created as the value of the ``author`` -property. +property: + +.. image:: img/properties-from-dict-records-person.png + :height: 200 Properties can be blacklisted with the ``properties_blacklist`` keyword. Since the @@ -343,13 +350,16 @@ For further customization, the used as a basis for :ref:`custom converters<Custom Converters>` which can make use of its ``referenced_record_callback`` argument. The ``referenced_record_callback`` can be a callable object which takes -exactly one Record as an argument and needs to return that Record -after doing whatever custom treatment is needed. It is applied to all -Records that are created from the dictionary and it can be used to, -e.g., transform values of some properties, or add special treatment to -all Records of a specific type. ``referenced_record_callback`` is -applied **after** the properties from the dictionary have been applied -as explained above. +exactly a Record as an argument and needs to return that Record after +doing whatever custom treatment is needed. Additionally, it is given +the ``RecordStore`` and the ``ValueStore`` in order to be able to +access the records and values that have already been defined from +within ``referenced_record_callback``. It is applied to all Records +that are created from the dictionary and it can be used to, e.g., +transform values of some properties, or add special treatment to all +Records of a specific type. ``referenced_record_callback`` is applied +**after** the properties from the dictionary have been applied as +explained above. Further converters ++++++++++++++++++ @@ -399,7 +409,7 @@ datamodel like H5Ndarray: obligatory_properties: internal_hdf5-path: - datatype: TEXT + datatype: TEXT although the names of both property and record type can be configured within the cfood definition. @@ -513,11 +523,11 @@ First we will create our package and module structure, which might be: tox.ini src/ scifolder/ - __init__.py - converters/ - __init__.py - sources.py # <- the actual file containing - # the converter class + __init__.py + converters/ + __init__.py + sources.py # <- the actual file containing + # the converter class doc/ unittests/ @@ -542,74 +552,74 @@ that would be given using a yaml definition (see next section below). """ def __init__(self, definition: dict, name: str, - converter_registry: dict): - """ - Initialize a new directory converter. - """ - super().__init__(definition, name, converter_registry) + converter_registry: dict): + """ + Initialize a new directory converter. + """ + super().__init__(definition, name, converter_registry) def create_children(self, generalStore: GeneralStore, - element: StructureElement): + element: StructureElement): - # The source resolver does not create children: + # The source resolver does not create children: - return [] + return [] def create_records(self, values: GeneralStore, - records: RecordStore, - element: StructureElement, - file_path_prefix): - if not isinstance(element, TextElement): - raise RuntimeError() - - # This function must return a list containing tuples, each one for a modified - # property: (name_of_entity, name_of_property) - keys_modified = [] - - # This is the name of the entity where the source is going to be attached: - attach_to_scientific_activity = self.definition["scientific_activity"] - rec = records[attach_to_scientific_activity] - - # The "source" is a path to a source project, so it should have the form: - # /<Category>/<project>/<scientific_activity>/ - # obtain these information from the structure element: - val = element.value - regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))' - '/(?P<project_date>.*?)_(?P<project_identifier>.*)' - '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/') - - res = re.match(regexp, val) - if res is None: - raise RuntimeError("Source cannot be parsed correctly.") - - # Mapping of categories on the file system to corresponding record types in CaosDB: - cat_map = { - "SimulationData": "Simulation", - "ExperimentalData": "Experiment", - "DataAnalysis": "DataAnalysis"} - linkrt = cat_map[res.group("category")] - - keys_modified.extend(create_records(values, records, { - "Project": { - "date": res.group("project_date"), - "identifier": res.group("project_identifier"), - }, - linkrt: { - "date": res.group("date"), - "identifier": res.group("identifier"), - "project": "$Project" - }, - attach_to_scientific_activity: { - "sources": "+$" + linkrt - }}, file_path_prefix)) - - # Process the records section of the yaml definition: - keys_modified.extend( - super().create_records(values, records, element, file_path_prefix)) - - # The create_records function must return the modified keys to make it compatible - # to the crawler functions: - return keys_modified + records: RecordStore, + element: StructureElement, + file_path_prefix): + if not isinstance(element, TextElement): + raise RuntimeError() + + # This function must return a list containing tuples, each one for a modified + # property: (name_of_entity, name_of_property) + keys_modified = [] + + # This is the name of the entity where the source is going to be attached: + attach_to_scientific_activity = self.definition["scientific_activity"] + rec = records[attach_to_scientific_activity] + + # The "source" is a path to a source project, so it should have the form: + # /<Category>/<project>/<scientific_activity>/ + # obtain these information from the structure element: + val = element.value + regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))' + '/(?P<project_date>.*?)_(?P<project_identifier>.*)' + '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/') + + res = re.match(regexp, val) + if res is None: + raise RuntimeError("Source cannot be parsed correctly.") + + # Mapping of categories on the file system to corresponding record types in CaosDB: + cat_map = { + "SimulationData": "Simulation", + "ExperimentalData": "Experiment", + "DataAnalysis": "DataAnalysis"} + linkrt = cat_map[res.group("category")] + + keys_modified.extend(create_records(values, records, { + "Project": { + "date": res.group("project_date"), + "identifier": res.group("project_identifier"), + }, + linkrt: { + "date": res.group("date"), + "identifier": res.group("identifier"), + "project": "$Project" + }, + attach_to_scientific_activity: { + "sources": "+$" + linkrt + }}, file_path_prefix)) + + # Process the records section of the yaml definition: + keys_modified.extend( + super().create_records(values, records, element, file_path_prefix)) + + # The create_records function must return the modified keys to make it compatible + # to the crawler functions: + return keys_modified If the recommended (python) package structure is used, the package containing the converter @@ -636,8 +646,8 @@ function signature: .. code-block:: python def create_records(values: GeneralStore, # <- pass the current variables store here - records: RecordStore, # <- pass the current store of CaosDB records here - def_records: dict): # <- This is the actual definition of new records! + records: RecordStore, # <- pass the current store of CaosDB records here + def_records: dict): # <- This is the actual definition of new records! `def_records` is the actual definition of new records according to the yaml cfood specification @@ -653,7 +663,7 @@ Let's have a look at a few examples: match: (?P<dir_name>.*) records: Experiment: - identifier: $dir_name + identifier: $dir_name This block will just create a new record with parent `Experiment` and one property `identifier` with a value derived from the matching regular expression. @@ -671,7 +681,7 @@ Let's formulate that using `create_records`: } keys_modified = create_records(values, records, - record_def) + record_def) The `dir_name` is set explicitely here, everything else is identical to the yaml statements. @@ -694,9 +704,9 @@ So, a sketch of a typical implementation within a custom converter could look li .. code-block:: python def create_records(self, values: GeneralStore, - records: RecordStore, - element: StructureElement, - file_path_prefix: str): + records: RecordStore, + element: StructureElement, + file_path_prefix: str): # Modify some records: record_def = { @@ -704,15 +714,15 @@ So, a sketch of a typical implementation within a custom converter could look li } keys_modified = create_records(values, records, - record_def) + record_def) # You can of course do it multiple times: keys_modified.extend(create_records(values, records, - record_def)) + record_def)) # You can also process the records section of the yaml definition: keys_modified.extend( - super().create_records(values, records, element, file_path_prefix)) + super().create_records(values, records, element, file_path_prefix)) # This essentially allows users of your converter to customize the creation of records # by providing a custom "records" section additionally to the modifications provided # in this implementation of the Converter. @@ -733,12 +743,12 @@ Let's have a look at a more complex examples, defining multiple records: match: (?P<dir_name>.*) records: Project: - identifier: project_name + identifier: project_name Experiment: - identifier: $dir_name - Project: $Project + identifier: $dir_name + Project: $Project ProjectGroup: - projects: +$Project + projects: +$Project This block will create two new Records: @@ -771,7 +781,7 @@ Let's formulate that using `create_records` (again, `dir_name` is constant here) } keys_modified = create_records(values, records, - record_def) + record_def) Debugging ========= @@ -787,7 +797,7 @@ output for the match step. The following snippet illustrates this: debug_match: True records: Project: - identifier: project_name + identifier: project_name Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against diff --git a/src/doc/img/properties-from-dict-records-author.png b/src/doc/img/properties-from-dict-records-author.png new file mode 100644 index 0000000000000000000000000000000000000000..20ee9497ab5ae577c3d515f11da6294c88601fed Binary files /dev/null and b/src/doc/img/properties-from-dict-records-author.png differ diff --git a/src/doc/img/properties-from-dict-records-person.png b/src/doc/img/properties-from-dict-records-person.png new file mode 100644 index 0000000000000000000000000000000000000000..8b026056a42ff3ba203c6077a426640c864b24c1 Binary files /dev/null and b/src/doc/img/properties-from-dict-records-person.png differ