@@ -208,7 +208,7 @@ Now we need to create a class called "SourceResolver" in the file "sources.py".
Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended.
In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
that would be given using a yaml definition.
that would be given using a yaml definition (see next section below).
.. code-block:: python
...
...
@@ -307,3 +307,151 @@ The following yaml block will register the converter in a yaml file:
SourceResolver:
package: scifolder.converters.sources
converter: SourceResolver
Using the `create_records` API function
=======================================
The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is
the recommended way to create new records from custom converters. Let's have a look at the
function signature:
.. code-block:: python
def create_records(values: GeneralStore, # <- pass the current variables store here
records: RecordStore, # <- pass the current store of CaosDB records here
def_records: dict): # <- This is the actual definition of new records!
`def_records` is the actual definition of new records according to the yaml cfood specification
(work in progress, in the docs). Essentially you can do everything here, that you could do
in the yaml document as well, but using python source code.
Let's have a look at a few examples:
.. code-block:: yaml
DirConverter:
type: Directory
match: (?P<dir_name>.*)
records:
Experiment:
identifier: $dir_name
This block will just create a new record with parent `Experiment` and one property
`identifier` with a value derived from the matching regular expression.
Let's formulate that using `create_records`:
.. code-block:: python
dir_name = "directory name"
record_def = {
"Experiment": {
"identifier": dir_name
}
}
keys_modified = create_records(values, records,
record_def)
The `dir_name` is set explicitely here, everything else is identical to the yaml statements.
The role of `keys_modified`
===========================
You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns
`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements:
- Element 0 is the name of the record that is modified (as used in the record store `records`).
- Element 1 is the name of the property that is modified.
It is important, that the correct list of modified keys is returned by
:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work.
So, a sketch of a typical implementation within a custom converter could look like this: