diff --git a/src/doc/converters.rst b/src/doc/converters.rst index 7ec93535ec41dc211e2fa7ee194b2ecbe1a659fb..640a1ddec04986960bcf41110c593bb73431ed75 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -208,7 +208,7 @@ Now we need to create a class called "SourceResolver" in the file "sources.py". Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended. In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure -that would be given using a yaml definition. +that would be given using a yaml definition (see next section below). .. code-block:: python @@ -307,3 +307,151 @@ The following yaml block will register the converter in a yaml file: SourceResolver: package: scifolder.converters.sources converter: SourceResolver + + +Using the `create_records` API function +======================================= + +The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is +the recommended way to create new records from custom converters. Let's have a look at the +function signature: + +.. code-block:: python + + def create_records(values: GeneralStore, # <- pass the current variables store here + records: RecordStore, # <- pass the current store of CaosDB records here + def_records: dict): # <- This is the actual definition of new records! + + +`def_records` is the actual definition of new records according to the yaml cfood specification +(work in progress, in the docs). Essentially you can do everything here, that you could do +in the yaml document as well, but using python source code. + +Let's have a look at a few examples: + +.. code-block:: yaml + + DirConverter: + type: Directory + match: (?P<dir_name>.*) + records: + Experiment: + identifier: $dir_name + +This block will just create a new record with parent `Experiment` and one property +`identifier` with a value derived from the matching regular expression. + +Let's formulate that using `create_records`: + +.. code-block:: python + + dir_name = "directory name" + + record_def = { + "Experiment": { + "identifier": dir_name + } + } + + keys_modified = create_records(values, records, + record_def) + +The `dir_name` is set explicitely here, everything else is identical to the yaml statements. + + +The role of `keys_modified` +=========================== + +You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns +`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements: + +- Element 0 is the name of the record that is modified (as used in the record store `records`). +- Element 1 is the name of the property that is modified. + +It is important, that the correct list of modified keys is returned by +:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work. + +So, a sketch of a typical implementation within a custom converter could look like this: + + +.. code-block:: python + + def create_records(self, values: GeneralStore, + records: RecordStore, + element: StructureElement, + file_path_prefix: str): + + # Modify some records: + record_def = { + # ... + } + + keys_modified = create_records(values, records, + record_def) + + # You can of course do it multiple times: + keys_modified.extend(create_records(values, records, + record_def)) + + # You can also process the records section of the yaml definition: + keys_modified.extend( + super().create_records(values, records, element, file_path_prefix)) + # This essentially allows users of your converter to customize the creation of records + # by providing a custom "records" section additionally to the modifications provided + # in this implementation of the Converter. + + # Important: Return the list of modified keys! + return keys_modified + + +More complex example +==================== + +Let's have a look at a more complex examples, defining multiple records: + +.. code-block:: yaml + + DirConverter: + type: Directory + match: (?P<dir_name>.*) + records: + Project: + identifier: project_name + Experiment: + identifier: $dir_name + Project: $Project + ProjectGroup: + projects: +$Project + + +This block will create two new Records: + +- A project with a constant identifier +- An experiment with an identifier, derived from a regular expression and a reference to the new project. + +Furthermore a Record `ProjectGroup` will be edited (its initial definition is not given in the +yaml block): The project that was just created will be added as a list element to the property +`projects`. + +Let's formulate that using `create_records` (again, `dir_name` is constant here): + +.. code-block:: python + + dir_name = "directory name" + + record_def = { + "Project": { + "identifier": "project_name", + } + "Experiment": { + "identifier": dir_name, + "Project": "$Project", + } + "ProjectGroup": { + "projects": "+$Project", + } + + } + + keys_modified = create_records(values, records, + record_def)