diff --git a/src/doc/converters.rst b/src/doc/converters.rst index 8cda5f2c5ec17db3d585c5eeaad23505ac0205d5..cf0d26bf79719ae87c824f6ba148b55e14e96c26 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -243,7 +243,34 @@ arrays that are in turn treated by the :ref:`H5GroupConverter`, the need to install the LinkAhead crawler with its optional ``h5crawler`` dependency for using these converters. +The basic idea when crawling HDF5 files is to treat them very similar to +:ref:`dictionaries <DictElement Converter>` in which the attributes on root, +group, or dataset level are essentially treated like ``BooleanElement``, +``TextElement``, ``FloatElement``, and ``IntegerElement`` in a dictionary: They +are appended as children and can be accessed via the ``subtree``. The file +itself and the groups within may contain further groups and datasets, which can +have their own attributes, subgroups, and datasets, very much like +``DictElements`` within a dictionary. The main difference to any other +dictionary type is the presence of multi-dimensional arrays within HDF5 +datasets. Since LinkAhead doesn't have any datatype corresponding to these, and +since it isn't desirable to store these arrays directly within LinkAhead for +reasons of performance and of searchability, we wrap them within a specific +Record as explained :ref:`below <H5NdarrayConverter>`, together with more +metadata and their internal path within the HDF5 file. Users can thus query for +datasets and their arrays according to their metadata within LinkAhead and then +use the internal path information to access the dataset within the file +directly. The type of this record and the property for storing the internal path +need to be reflected in the datamodel. Using the default names, you would need a datamodel like +.. code-block:: yaml + + H5Ndarray: + obligatory_properties: + internal_hdf5-path: + datatype: TEXT + +although the names of both property and record type can be configured within the +cfood definition. H5FileConverter --------------- @@ -267,7 +294,7 @@ H5DatasetConverter This is an extension of the :py:class:`~caoscrawler.converters.DictElementConverter` class. Most importantly, it stores the array data in HDF5 dataset into -:py:class:`~caoscrawler.hdf5_converters.H5NdarrayElement` which is added to its +:py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its children, as well as the dataset attributes. H5NdarrayConverter