Skip to content
Snippets Groups Projects
Commit d5abed25 authored by Alexander Schlemmer's avatar Alexander Schlemmer
Browse files

Merge branch 'f-improve-custom-converter-doc' into 'dev'

DOC: added explanations on how to use the create_records function

See merge request !52
parents 3f97db11 b7088850
No related branches found
No related tags found
2 merge requests!71REL: RElease v0.2.0,!52DOC: added explanations on how to use the create_records function
Pipeline #29510 passed
...@@ -208,7 +208,7 @@ Now we need to create a class called "SourceResolver" in the file "sources.py". ...@@ -208,7 +208,7 @@ Now we need to create a class called "SourceResolver" in the file "sources.py".
Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended. number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended.
In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
that would be given using a yaml definition. that would be given using a yaml definition (see next section below).
.. code-block:: python .. code-block:: python
...@@ -307,3 +307,151 @@ The following yaml block will register the converter in a yaml file: ...@@ -307,3 +307,151 @@ The following yaml block will register the converter in a yaml file:
SourceResolver: SourceResolver:
package: scifolder.converters.sources package: scifolder.converters.sources
converter: SourceResolver converter: SourceResolver
Using the `create_records` API function
=======================================
The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is
the recommended way to create new records from custom converters. Let's have a look at the
function signature:
.. code-block:: python
def create_records(values: GeneralStore, # <- pass the current variables store here
records: RecordStore, # <- pass the current store of CaosDB records here
def_records: dict): # <- This is the actual definition of new records!
`def_records` is the actual definition of new records according to the yaml cfood specification
(work in progress, in the docs). Essentially you can do everything here, that you could do
in the yaml document as well, but using python source code.
Let's have a look at a few examples:
.. code-block:: yaml
DirConverter:
type: Directory
match: (?P<dir_name>.*)
records:
Experiment:
identifier: $dir_name
This block will just create a new record with parent `Experiment` and one property
`identifier` with a value derived from the matching regular expression.
Let's formulate that using `create_records`:
.. code-block:: python
dir_name = "directory name"
record_def = {
"Experiment": {
"identifier": dir_name
}
}
keys_modified = create_records(values, records,
record_def)
The `dir_name` is set explicitely here, everything else is identical to the yaml statements.
The role of `keys_modified`
===========================
You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns
`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements:
- Element 0 is the name of the record that is modified (as used in the record store `records`).
- Element 1 is the name of the property that is modified.
It is important, that the correct list of modified keys is returned by
:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work.
So, a sketch of a typical implementation within a custom converter could look like this:
.. code-block:: python
def create_records(self, values: GeneralStore,
records: RecordStore,
element: StructureElement,
file_path_prefix: str):
# Modify some records:
record_def = {
# ...
}
keys_modified = create_records(values, records,
record_def)
# You can of course do it multiple times:
keys_modified.extend(create_records(values, records,
record_def))
# You can also process the records section of the yaml definition:
keys_modified.extend(
super().create_records(values, records, element, file_path_prefix))
# This essentially allows users of your converter to customize the creation of records
# by providing a custom "records" section additionally to the modifications provided
# in this implementation of the Converter.
# Important: Return the list of modified keys!
return keys_modified
More complex example
====================
Let's have a look at a more complex examples, defining multiple records:
.. code-block:: yaml
DirConverter:
type: Directory
match: (?P<dir_name>.*)
records:
Project:
identifier: project_name
Experiment:
identifier: $dir_name
Project: $Project
ProjectGroup:
projects: +$Project
This block will create two new Records:
- A project with a constant identifier
- An experiment with an identifier, derived from a regular expression and a reference to the new project.
Furthermore a Record `ProjectGroup` will be edited (its initial definition is not given in the
yaml block): The project that was just created will be added as a list element to the property
`projects`.
Let's formulate that using `create_records` (again, `dir_name` is constant here):
.. code-block:: python
dir_name = "directory name"
record_def = {
"Project": {
"identifier": "project_name",
}
"Experiment": {
"identifier": dir_name,
"Project": "$Project",
}
"ProjectGroup": {
"projects": "+$Project",
}
}
keys_modified = create_records(values, records,
record_def)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment