Skip to content
Snippets Groups Projects
Commit 56b73398 authored by Florian Spreckelsen's avatar Florian Spreckelsen
Browse files

DOC: Explain datamodel requirements

parent f3e8ec1e
Branches
Tags
2 merge requests!160STY: styling,!143ENH: HDF5 Converter
Pipeline #47453 passed
...@@ -243,7 +243,34 @@ arrays that are in turn treated by the :ref:`H5GroupConverter`, the ...@@ -243,7 +243,34 @@ arrays that are in turn treated by the :ref:`H5GroupConverter`, the
need to install the LinkAhead crawler with its optional ``h5crawler`` dependency need to install the LinkAhead crawler with its optional ``h5crawler`` dependency
for using these converters. for using these converters.
The basic idea when crawling HDF5 files is to treat them very similar to
:ref:`dictionaries <DictElement Converter>` in which the attributes on root,
group, or dataset level are essentially treated like ``BooleanElement``,
``TextElement``, ``FloatElement``, and ``IntegerElement`` in a dictionary: They
are appended as children and can be accessed via the ``subtree``. The file
itself and the groups within may contain further groups and datasets, which can
have their own attributes, subgroups, and datasets, very much like
``DictElements`` within a dictionary. The main difference to any other
dictionary type is the presence of multi-dimensional arrays within HDF5
datasets. Since LinkAhead doesn't have any datatype corresponding to these, and
since it isn't desirable to store these arrays directly within LinkAhead for
reasons of performance and of searchability, we wrap them within a specific
Record as explained :ref:`below <H5NdarrayConverter>`, together with more
metadata and their internal path within the HDF5 file. Users can thus query for
datasets and their arrays according to their metadata within LinkAhead and then
use the internal path information to access the dataset within the file
directly. The type of this record and the property for storing the internal path
need to be reflected in the datamodel. Using the default names, you would need a datamodel like
.. code-block:: yaml
H5Ndarray:
obligatory_properties:
internal_hdf5-path:
datatype: TEXT
although the names of both property and record type can be configured within the
cfood definition.
H5FileConverter H5FileConverter
--------------- ---------------
...@@ -267,7 +294,7 @@ H5DatasetConverter ...@@ -267,7 +294,7 @@ H5DatasetConverter
This is an extension of the This is an extension of the
:py:class:`~caoscrawler.converters.DictElementConverter` class. Most :py:class:`~caoscrawler.converters.DictElementConverter` class. Most
importantly, it stores the array data in HDF5 dataset into importantly, it stores the array data in HDF5 dataset into
:py:class:`~caoscrawler.hdf5_converters.H5NdarrayElement` which is added to its :py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its
children, as well as the dataset attributes. children, as well as the dataset attributes.
H5NdarrayConverter H5NdarrayConverter
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment