Skip to content
Snippets Groups Projects
Commit 45839173 authored by Florian Spreckelsen's avatar Florian Spreckelsen
Browse files

DOC: Explain how to upgrade

parent f1163033
No related branches found
No related tags found
2 merge requests!181Release 0.9.0,!180F converter submodule
Pipeline #54932 passed
......@@ -13,6 +13,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed ###
* Moved the optional `hdf5_converter` to the `converters`
submodule. When updating from 0.8 or below, this means that you have
to adapt the converter package path in your cfood definition from
`caoscrawler.hdf5_converter` to
`caoscrawler.converters.hdf5_converter`.
### Deprecated ###
### Removed ###
......
......@@ -32,13 +32,16 @@ The yaml definition may look like this:
The **<NodeName>** is a description of what the current block represents (e.g.
``experiment-folder``) and is used as an identifier.
**<type>** selects the converter that is going to be matched against the current structure
element. If the structure element matches (this is a combination of a typecheck and a detailed
match, see the :py:class:`~caoscrawler.converters.Converter` source documentation for details), the
converter will:
- generate records (with :py:meth:`~caoscrawler.converters.Converter.create_records`)
- possibly process a subtree (with :py:meth:`caoscrawler.converters.Converter.create_children`)
**<type>** selects the converter that is going to be matched against
the current structure element. If the structure element matches (this
is a combination of a typecheck and a detailed match, see the
:py:class:`~caoscrawler.converters.converters.Converter` source
documentation for details), the converter will:
- generate records (with
:py:meth:`~caoscrawler.converters.converters.Converter.create_records`)
- possibly process a subtree (with
:py:meth:`~caoscrawler.converters.converters.Converter.create_children`)
**match** *TODO*
......
......@@ -27,20 +27,20 @@ Details:
- **<NameOfTheConverterInYamlFile>**: This is the name of the converter as it is going to be used in the present yaml file.
- **<python>.<module>.<name>**: The name of the module where the converter class resides.
- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.Converter`.
- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.converters.Converter`.
Implementing a custom converter
===============================
Converters inherit from the :py:class:`~caoscrawler.converters.Converter` class.
Converters inherit from the :py:class:`~caoscrawler.converters.converters.Converter` class.
The following methods are abstract and need to be overwritten by your custom converter to make it work:
:py:meth:`~caoscrawler.converters.Converter.create_children`:
:py:meth:`~caoscrawler.converters.converters.Converter.create_children`:
Return a list of child StructureElement objects.
- :py:meth:`~caoscrawler.converters.Converter.match`
- :py:meth:`~caoscrawler.converters.Converter.typecheck`
- :py:meth:`~caoscrawler.converters.converters.Converter.match`
- :py:meth:`~caoscrawler.converters.converters.Converter.typecheck`
Example
......@@ -71,10 +71,10 @@ First we will create our package and module structure, which might be:
doc/
unittests/
Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.Converter`, but use :py:class:`~caoscrawler.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.Converter.match` and :py:meth:`~caoscrawler.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.Converter.create_children` has to be provided by us.
Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended.
In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.converters.Converter`, but use :py:class:`~caoscrawler.converters.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.converters.Converter.match` and :py:meth:`~caoscrawler.converters.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.converters.Converter.create_children` has to be provided by us.
Furthermore we will customize the method :py:meth:`~caoscrawler.converters.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.converters.Converter.create_records` is recommended.
In this context it is recommended to make use of the function :func:`caoscrawler.converters.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
that would be given using a yaml definition (see next section below).
.. code-block:: python
......@@ -179,7 +179,7 @@ The following yaml block will register the converter in a yaml file:
Using the `create_records` API function
=======================================
The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is
The function :func:`caoscrawler.converters.converters.create_records` was already mentioned above and it is
the recommended way to create new records from custom converters. Let's have a look at the
function signature:
......@@ -229,14 +229,14 @@ The `dir_name` is set explicitely here, everything else is identical to the yaml
The role of `keys_modified`
===========================
You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns
You probably have noticed already, that :func:`caoscrawler.converters.converters.create_records` returns
`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements:
- Element 0 is the name of the record that is modified (as used in the record store `records`).
- Element 1 is the name of the property that is modified.
It is important, that the correct list of modified keys is returned by
:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work.
:py:meth:`~caoscrawler.converters.converters.Converter.create_records` to make the crawler process work.
So, a sketch of a typical implementation within a custom converter could look like this:
......
......@@ -64,26 +64,28 @@ H5FileConverter
---------------
This is an extension of the
:py:class:`~caoscrawler.converters.SimpleFileConverter` class. It opens the HDF5
file and creates children for any contained group or dataset. Additionally, the
root-level attributes of the HDF5 file are accessible as children.
:py:class:`~caoscrawler.converters.converters.SimpleFileConverter`
class. It opens the HDF5 file and creates children for any contained
group or dataset. Additionally, the root-level attributes of the HDF5
file are accessible as children.
H5GroupConverter
----------------
This is an extension of the
:py:class:`~caoscrawler.converters.DictElementConverter` class. Children are
created for all subgroups and datasets in this HDF5 group. Additionally, the
group-level attributes are accessible as children.
:py:class:`~caoscrawler.converters.converters.DictElementConverter`
class. Children are created for all subgroups and datasets in this
HDF5 group. Additionally, the group-level attributes are accessible as
children.
H5DatasetConverter
------------------
This is an extension of the
:py:class:`~caoscrawler.converters.DictElementConverter` class. Most
importantly, it stores the array data in HDF5 dataset into
:py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its
children, as well as the dataset attributes.
:py:class:`~caoscrawler.converters.converters.DictElementConverter`
class. Most importantly, it stores the array data in HDF5 dataset into
:py:class:`~caoscrawler.converters.hdf5_converter.H5NdarrayElement`
which is added to its children, as well as the dataset attributes.
H5NdarrayConverter
------------------
......
......@@ -131,9 +131,9 @@ CSV File → DictElement
PropertiesFromDictConverter
===========================
The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is
The :py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` is
a specialization of the
:py:class:`~caoscrawler.converters.DictElementConverter` and offers
:py:class:`~caoscrawler.converters.converters.DictElementConverter` and offers
all its functionality. It is meant to operate on dictionaries (e.g.,
from reading in a json or a table file), the keys of which correspond
closely to properties in a LinkAhead datamodel. This is especially
......@@ -141,7 +141,7 @@ handy in cases where properties may be added to the data model and
data sources that are not yet known when writing the cfood definition.
The converter definition of the
:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an
:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` has an
additional required entry ``record_from_dict`` which specifies the
Record to which the properties extracted from the dict are attached
to. This Record is identified by its ``variable_name`` by which it can
......@@ -228,22 +228,22 @@ via the ``referenced_record_callback`` (see below).
Properties can be blacklisted with the ``properties_blacklist``
keyword, i.e., all keys listed under ``properties_blacklist`` will be
excluded from automated treatment. Since the
:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has
:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` has
all the functionality of the
:py:class:`~caoscrawler.converters.DictElementConverter`, individual
:py:class:`~caoscrawler.converters.converters.DictElementConverter`, individual
properties can still be used in a subtree. Together with
``properties_blacklist`` this can be used to add custom treatment to
specific properties by blacklisting them in ``record_from_dict`` and
then treating them in the subtree the same as you would do it in the
standard
:py:class:`~caoscrawler.converters.DictElementConverter`. Note that
:py:class:`~caoscrawler.converters.converters.DictElementConverter`. Note that
the blacklisted keys are excluded on **all** levels of the dictionary,
i.e., also when they occur in a referenced entity.
For further customization, the
:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` can be
used as a basis for :ref:`custom converters<Custom Converters>` which
can make use of its ``referenced_record_callback`` argument. The
:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter`
can be used as a basis for :ref:`custom converters<Custom Converters>`
which can make use of its ``referenced_record_callback`` argument. The
``referenced_record_callback`` can be a callable object which takes
exactly a Record as an argument and needs to return that Record after
doing whatever custom treatment is needed. Additionally, it is given
......
# How to upgrade
## 0.8.x to 0.9.0
If you were using the optional HDF5 converter classes, you need to
adapt the package path in your cfood definition from the **old**
```yaml
Converters:
H5Dataset:
converter: H5DatasetConverter
package: caoscrawler.hdf5_converter
H5File:
converter: H5FileConverter
package: caoscrawler.hdf5_converter
H5Group:
converter: H5GroupConverter
package: caoscrawler.hdf5_converter
H5Ndarray:
converter: H5NdarrayConverter
package: caoscrawler.hdf5_converter
```
to the **new** paths:
```yaml
Converters:
H5Dataset:
converter: H5DatasetConverter
package: caoscrawler.converters.hdf5_converter
H5File:
converter: H5FileConverter
package: caoscrawler.converters.hdf5_converter
H5Group:
converter: H5GroupConverter
package: caoscrawler.converters.hdf5_converter
H5Ndarray:
converter: H5NdarrayConverter
package: caoscrawler.converters.hdf5_converter
```
## 0.6.x to 0.7.0
If you added Parents to Records at multiple places in the CFood, you must now
do this at a single location because this key now overwrites previously set
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment