diff --git a/CHANGELOG.md b/CHANGELOG.md index 2ba6c84749478314882a4131754bf9cc7fc5b184..be582170bc598736a55ca5d38fd06c4477a0eaf0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed ### +* Moved the optional `hdf5_converter` to the `converters` + submodule. When updating from 0.8 or below, this means that you have + to adapt the converter package path in your cfood definition from + `caoscrawler.hdf5_converter` to + `caoscrawler.converters.hdf5_converter`. + ### Deprecated ### ### Removed ### diff --git a/src/doc/converters/cfood_definition.rst b/src/doc/converters/cfood_definition.rst index 13c04fd38df8b00c435192a1c3cf02147f870b4c..ea2f14b23bec04e659aa3166f089c7d274f74811 100644 --- a/src/doc/converters/cfood_definition.rst +++ b/src/doc/converters/cfood_definition.rst @@ -32,13 +32,16 @@ The yaml definition may look like this: The **<NodeName>** is a description of what the current block represents (e.g. ``experiment-folder``) and is used as an identifier. -**<type>** selects the converter that is going to be matched against the current structure -element. If the structure element matches (this is a combination of a typecheck and a detailed -match, see the :py:class:`~caoscrawler.converters.Converter` source documentation for details), the -converter will: - -- generate records (with :py:meth:`~caoscrawler.converters.Converter.create_records`) -- possibly process a subtree (with :py:meth:`caoscrawler.converters.Converter.create_children`) +**<type>** selects the converter that is going to be matched against +the current structure element. If the structure element matches (this +is a combination of a typecheck and a detailed match, see the +:py:class:`~caoscrawler.converters.converters.Converter` source +documentation for details), the converter will: + +- generate records (with + :py:meth:`~caoscrawler.converters.converters.Converter.create_records`) +- possibly process a subtree (with + :py:meth:`~caoscrawler.converters.converters.Converter.create_children`) **match** *TODO* diff --git a/src/doc/converters/custom_converters.rst b/src/doc/converters/custom_converters.rst index 573d9714488eaacd2c794b1fa497306a8d110a5f..2738d66c483148fdecb9b189edac45e5b9a55a8b 100644 --- a/src/doc/converters/custom_converters.rst +++ b/src/doc/converters/custom_converters.rst @@ -27,20 +27,20 @@ Details: - **<NameOfTheConverterInYamlFile>**: This is the name of the converter as it is going to be used in the present yaml file. - **<python>.<module>.<name>**: The name of the module where the converter class resides. -- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.Converter`. +- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.converters.Converter`. Implementing a custom converter =============================== -Converters inherit from the :py:class:`~caoscrawler.converters.Converter` class. +Converters inherit from the :py:class:`~caoscrawler.converters.converters.Converter` class. The following methods are abstract and need to be overwritten by your custom converter to make it work: -:py:meth:`~caoscrawler.converters.Converter.create_children`: +:py:meth:`~caoscrawler.converters.converters.Converter.create_children`: Return a list of child StructureElement objects. -- :py:meth:`~caoscrawler.converters.Converter.match` -- :py:meth:`~caoscrawler.converters.Converter.typecheck` +- :py:meth:`~caoscrawler.converters.converters.Converter.match` +- :py:meth:`~caoscrawler.converters.converters.Converter.typecheck` Example @@ -71,10 +71,10 @@ First we will create our package and module structure, which might be: doc/ unittests/ -Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.Converter`, but use :py:class:`~caoscrawler.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.Converter.match` and :py:meth:`~caoscrawler.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.Converter.create_children` has to be provided by us. -Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed -number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended. -In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure +Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.converters.Converter`, but use :py:class:`~caoscrawler.converters.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.converters.Converter.match` and :py:meth:`~caoscrawler.converters.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.converters.Converter.create_children` has to be provided by us. +Furthermore we will customize the method :py:meth:`~caoscrawler.converters.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed +number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.converters.Converter.create_records` is recommended. +In this context it is recommended to make use of the function :func:`caoscrawler.converters.converters.create_records` that implements creation of record objects from python dictionaries of the same structure that would be given using a yaml definition (see next section below). .. code-block:: python @@ -179,7 +179,7 @@ The following yaml block will register the converter in a yaml file: Using the `create_records` API function ======================================= -The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is +The function :func:`caoscrawler.converters.converters.create_records` was already mentioned above and it is the recommended way to create new records from custom converters. Let's have a look at the function signature: @@ -229,14 +229,14 @@ The `dir_name` is set explicitely here, everything else is identical to the yaml The role of `keys_modified` =========================== -You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns +You probably have noticed already, that :func:`caoscrawler.converters.converters.create_records` returns `keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements: - Element 0 is the name of the record that is modified (as used in the record store `records`). - Element 1 is the name of the property that is modified. It is important, that the correct list of modified keys is returned by -:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work. +:py:meth:`~caoscrawler.converters.converters.Converter.create_records` to make the crawler process work. So, a sketch of a typical implementation within a custom converter could look like this: diff --git a/src/doc/converters/further_converters.rst b/src/doc/converters/further_converters.rst index 539c5159eb1de01765a78e3c04e10fb3f0be9be5..a334c8778f440e108fd141b0fc53ec06765deb8c 100644 --- a/src/doc/converters/further_converters.rst +++ b/src/doc/converters/further_converters.rst @@ -64,26 +64,28 @@ H5FileConverter --------------- This is an extension of the -:py:class:`~caoscrawler.converters.SimpleFileConverter` class. It opens the HDF5 -file and creates children for any contained group or dataset. Additionally, the -root-level attributes of the HDF5 file are accessible as children. +:py:class:`~caoscrawler.converters.converters.SimpleFileConverter` +class. It opens the HDF5 file and creates children for any contained +group or dataset. Additionally, the root-level attributes of the HDF5 +file are accessible as children. H5GroupConverter ---------------- This is an extension of the -:py:class:`~caoscrawler.converters.DictElementConverter` class. Children are -created for all subgroups and datasets in this HDF5 group. Additionally, the -group-level attributes are accessible as children. +:py:class:`~caoscrawler.converters.converters.DictElementConverter` +class. Children are created for all subgroups and datasets in this +HDF5 group. Additionally, the group-level attributes are accessible as +children. H5DatasetConverter ------------------ This is an extension of the -:py:class:`~caoscrawler.converters.DictElementConverter` class. Most -importantly, it stores the array data in HDF5 dataset into -:py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its -children, as well as the dataset attributes. +:py:class:`~caoscrawler.converters.converters.DictElementConverter` +class. Most importantly, it stores the array data in HDF5 dataset into +:py:class:`~caoscrawler.converters.hdf5_converter.H5NdarrayElement` +which is added to its children, as well as the dataset attributes. H5NdarrayConverter ------------------ diff --git a/src/doc/converters/standard_converters.rst b/src/doc/converters/standard_converters.rst index 3dc3c882e76e10706d030ba0695d498631bf7b28..586b84b48be78f1307298a11ad61a2448c3c3cd7 100644 --- a/src/doc/converters/standard_converters.rst +++ b/src/doc/converters/standard_converters.rst @@ -131,9 +131,9 @@ CSV File → DictElement PropertiesFromDictConverter =========================== -The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is +The :py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` is a specialization of the -:py:class:`~caoscrawler.converters.DictElementConverter` and offers +:py:class:`~caoscrawler.converters.converters.DictElementConverter` and offers all its functionality. It is meant to operate on dictionaries (e.g., from reading in a json or a table file), the keys of which correspond closely to properties in a LinkAhead datamodel. This is especially @@ -141,7 +141,7 @@ handy in cases where properties may be added to the data model and data sources that are not yet known when writing the cfood definition. The converter definition of the -:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an +:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` has an additional required entry ``record_from_dict`` which specifies the Record to which the properties extracted from the dict are attached to. This Record is identified by its ``variable_name`` by which it can @@ -183,7 +183,7 @@ with value "Silvia Scientist": .. image:: ../img/properties-from-dict-records-author.png :height: 210 :alt: A Record "New Name" and an author Record with full_name - "Silvia Scientist" are generated and filled automatically. + "Silvia Scientist" are generated and filled automatically. Note how the different dictionary keys are handled differently depending on their types: scalar and list values are understood @@ -219,7 +219,7 @@ property: .. image:: ../img/properties-from-dict-records-person.png :height: 200 :alt: A new Person Record is created which is referenced as an - author. + author. For the time being, only the parents of the referenced record can be set via this option. More complicated treatments can be implemented @@ -228,22 +228,22 @@ via the ``referenced_record_callback`` (see below). Properties can be blacklisted with the ``properties_blacklist`` keyword, i.e., all keys listed under ``properties_blacklist`` will be excluded from automated treatment. Since the -:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has +:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` has all the functionality of the -:py:class:`~caoscrawler.converters.DictElementConverter`, individual +:py:class:`~caoscrawler.converters.converters.DictElementConverter`, individual properties can still be used in a subtree. Together with ``properties_blacklist`` this can be used to add custom treatment to specific properties by blacklisting them in ``record_from_dict`` and then treating them in the subtree the same as you would do it in the standard -:py:class:`~caoscrawler.converters.DictElementConverter`. Note that +:py:class:`~caoscrawler.converters.converters.DictElementConverter`. Note that the blacklisted keys are excluded on **all** levels of the dictionary, i.e., also when they occur in a referenced entity. For further customization, the -:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` can be -used as a basis for :ref:`custom converters<Custom Converters>` which -can make use of its ``referenced_record_callback`` argument. The +:py:class:`~caoscrawler.converters.converters.PropertiesFromDictConverter` +can be used as a basis for :ref:`custom converters<Custom Converters>` +which can make use of its ``referenced_record_callback`` argument. The ``referenced_record_callback`` can be a callable object which takes exactly a Record as an argument and needs to return that Record after doing whatever custom treatment is needed. Additionally, it is given diff --git a/src/doc/how-to-upgrade.md b/src/doc/how-to-upgrade.md index 30d23f8f3a4ad88f6b3f4fca18013e26fbcb1dc1..8af805ea30cc85cdde88d789ee3538b2bbaef7e3 100644 --- a/src/doc/how-to-upgrade.md +++ b/src/doc/how-to-upgrade.md @@ -1,5 +1,45 @@ # How to upgrade + +## 0.8.x to 0.9.0 + +If you were using the optional HDF5 converter classes, you need to +adapt the package path in your cfood definition from the **old** + +```yaml +Converters: + H5Dataset: + converter: H5DatasetConverter + package: caoscrawler.hdf5_converter + H5File: + converter: H5FileConverter + package: caoscrawler.hdf5_converter + H5Group: + converter: H5GroupConverter + package: caoscrawler.hdf5_converter + H5Ndarray: + converter: H5NdarrayConverter + package: caoscrawler.hdf5_converter +``` + +to the **new** paths: + +```yaml +Converters: + H5Dataset: + converter: H5DatasetConverter + package: caoscrawler.converters.hdf5_converter + H5File: + converter: H5FileConverter + package: caoscrawler.converters.hdf5_converter + H5Group: + converter: H5GroupConverter + package: caoscrawler.converters.hdf5_converter + H5Ndarray: + converter: H5NdarrayConverter + package: caoscrawler.converters.hdf5_converter +``` + ## 0.6.x to 0.7.0 If you added Parents to Records at multiple places in the CFood, you must now do this at a single location because this key now overwrites previously set