diff --git a/src/doc/concepts.rst b/src/doc/concepts.rst
index 4070aeffa3b611debdebbf74baff126ca4e0032b..b3aa02a151a4d03c1531094ea01a5246cb02ba73 100644
--- a/src/doc/concepts.rst
+++ b/src/doc/concepts.rst
@@ -20,7 +20,7 @@ example a tree of Python *file objects* (StructureElements) could correspond to
 
 Relevant sources in:
 
-- ``src/structure_elements.py``
+- :py:mod:`caoscrawler.structure_elements`
 
 .. _ConceptConverters:
 
@@ -38,7 +38,7 @@ See the chapter :std:doc:`Converters<converters/index>` for details.
 
 Relevant sources in:
 
-- ``src/converters.py``
+- :py:mod:`caoscrawler.converters`
 
 
 Identifiables
@@ -70,8 +70,8 @@ In the current implementation an identifiable can only use one RecordType even t
 
 Relevant sources in
 
-- ``src/identifiable_adapters.py``
-- ``src/identifiable.py``
+- :py:mod:`caoscrawler.identifiable_adapters`
+- :py:mod:`caoscrawler.identifiable`
 
 Registered Identifiables
 ++++++++++++++++++++++++
@@ -110,7 +110,7 @@ The crawler can be considered the main program doing the synchronization in basi
 
 Relevant sources in:
 
-- ``src/crawl.py``
+- :py:mod:`caoscrawler.crawl`
 
 
 
diff --git a/src/doc/converters/cfood_definition.rst b/src/doc/converters/cfood_definition.rst
new file mode 100644
index 0000000000000000000000000000000000000000..13c04fd38df8b00c435192a1c3cf02147f870b4c
--- /dev/null
+++ b/src/doc/converters/cfood_definition.rst
@@ -0,0 +1,50 @@
+CFood definition
+++++++++++++++++
+
+Converter application to data is specified via a tree-like yml file (called ``cfood.yml``, by
+convention).  The yml file specifies which Converters shall be used on which StructureElements, and
+how to treat the generated *child* StructureElements.
+
+The yaml definition may look like this:
+
+.. todo::
+
+  This is outdated, see ``cfood-schema.yml`` for the current specification of a ``cfood.yml``.
+
+.. code-block:: yaml
+
+    <NodeName>:
+	type: <ConverterName>
+	match: ".*"
+	records:
+	    Experiment1:
+		parents:
+		- Experiment
+		- Blablabla
+		date: $DATUM
+		(...)
+	    Experiment2:
+		parents:
+		- Experiment
+	subtree:
+	    (...)
+
+The **<NodeName>** is a description of what the current block represents (e.g.
+``experiment-folder``) and is used as an identifier.
+
+**<type>** selects the converter that is going to be matched against the current structure
+element. If the structure element matches (this is a combination of a typecheck and a detailed
+match, see the :py:class:`~caoscrawler.converters.Converter` source documentation for details), the
+converter will:
+
+- generate records (with :py:meth:`~caoscrawler.converters.Converter.create_records`)
+- possibly process a subtree (with :py:meth:`caoscrawler.converters.Converter.create_children`)
+
+**match** *TODO*
+
+**records** is a dict of definitions that define the semantic structure
+(see details below).
+
+**subtree** makes the yaml recursive: It contains a list of new Converter
+definitions, which work on the StructureElements that are returned by the
+current Converter.
diff --git a/src/doc/converters/custom_converters.rst b/src/doc/converters/custom_converters.rst
new file mode 100644
index 0000000000000000000000000000000000000000..573d9714488eaacd2c794b1fa497306a8d110a5f
--- /dev/null
+++ b/src/doc/converters/custom_converters.rst
@@ -0,0 +1,344 @@
+Custom Converters
++++++++++++++++++
+
+As mentioned before it is possible to create custom converters.
+These custom converters can be used to integrate arbitrary data extraction and ETL capabilities
+into the LinkAhead crawler and make these extensions available to any yaml specification.
+
+Tell the crawler about a custom converter
+=========================================
+
+To use a custom crawler, it must be defined in the ``Converters`` section of the CFood yaml file.
+The basic syntax for adding a custom converter to a definition file is:
+
+.. code-block:: yaml
+
+   Converters:
+     <NameOfTheConverterInYamlFile>:
+       package: <python>.<module>.<name>
+       converter: <PythonClassName>
+
+The Converters section can be either put into the first or the second
+document of the cfood yaml file. It can be also part of a
+single-document yaml cfood file. Please refer to :doc:`the cfood
+documentation<../cfood>` for more details.
+
+Details:
+
+- **<NameOfTheConverterInYamlFile>**: This is the name of the converter as it is going to be used in the present yaml file.
+- **<python>.<module>.<name>**: The name of the module where the converter class resides.
+- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.Converter`.
+
+Implementing a custom converter
+===============================
+
+Converters inherit from the :py:class:`~caoscrawler.converters.Converter` class.
+
+The following methods are abstract and need to be overwritten by your custom converter to make it work:
+
+:py:meth:`~caoscrawler.converters.Converter.create_children`:
+    Return a list of child StructureElement objects.
+
+- :py:meth:`~caoscrawler.converters.Converter.match`
+- :py:meth:`~caoscrawler.converters.Converter.typecheck`
+
+
+Example
+=======
+
+In the following, we will explain the process of adding a custom converter to a yaml file using
+a SourceResolver that is able to attach a source element to another entity.
+
+**Note**: This example might become a standard crawler soon, as part of the scifolder specification. See https://doi.org/10.3390/data5020043 for details. In this documentation example we will, therefore, add it to a package called "scifolder".
+
+First we will create our package and module structure, which might be:
+
+.. code-block::
+
+   scifolder_package/
+     README.md
+     setup.cfg
+     setup.py
+     Makefile
+     tox.ini
+     src/
+       scifolder/
+	 __init__.py
+	 converters/
+	   __init__.py
+	   sources.py  # <- the actual file containing
+		       #    the converter class
+     doc/
+     unittests/
+
+Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.Converter`, but use :py:class:`~caoscrawler.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.Converter.match` and :py:meth:`~caoscrawler.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.Converter.create_children` has to be provided by us.
+Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
+number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended.
+In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
+that would be given using a yaml definition (see next section below).
+
+.. code-block:: python
+
+    import re
+    from caoscrawler.stores import GeneralStore, RecordStore
+    from caoscrawler.converters import TextElementConverter, create_records
+    from caoscrawler.structure_elements import StructureElement, TextElement
+
+
+    class SourceResolver(TextElementConverter):
+      """
+      This resolver uses a source list element (e.g. from the markdown readme file)
+      to link sources correctly.
+      """
+
+      def __init__(self, definition: dict, name: str,
+		   converter_registry: dict):
+	  """
+	  Initialize a new directory converter.
+	  """
+	  super().__init__(definition, name, converter_registry)
+
+      def create_children(self, generalStore: GeneralStore,
+				element: StructureElement):
+
+	  # The source resolver does not create children:
+
+	  return []
+
+      def create_records(self, values: GeneralStore,
+			 records: RecordStore,
+			 element: StructureElement,
+			 file_path_prefix):
+	  if not isinstance(element, TextElement):
+	      raise RuntimeError()
+
+	  # This function must return a list containing tuples, each one for a modified
+	  # property: (name_of_entity, name_of_property)
+	  keys_modified = []
+
+	  # This is the name of the entity where the source is going to be attached:
+	  attach_to_scientific_activity = self.definition["scientific_activity"]
+	  rec = records[attach_to_scientific_activity]
+
+	  # The "source" is a path to a source project, so it should have the form:
+	  # /<Category>/<project>/<scientific_activity>/
+	  # obtain these information from the structure element:
+	  val = element.value
+	  regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
+		    '/(?P<project_date>.*?)_(?P<project_identifier>.*)'
+		    '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
+
+	  res = re.match(regexp, val)
+	  if res is None:
+	      raise RuntimeError("Source cannot be parsed correctly.")
+
+	  # Mapping of categories on the file system to corresponding record types in CaosDB:
+	  cat_map = {
+	      "SimulationData": "Simulation",
+	      "ExperimentalData": "Experiment",
+	      "DataAnalysis": "DataAnalysis"}
+	  linkrt = cat_map[res.group("category")]
+
+	  keys_modified.extend(create_records(values, records, {
+	      "Project": {
+		  "date": res.group("project_date"),
+		  "identifier": res.group("project_identifier"),
+	      },
+	      linkrt: {
+		  "date": res.group("date"),
+		  "identifier": res.group("identifier"),
+		  "project": "$Project"
+	      },
+	      attach_to_scientific_activity: {
+		  "sources": "+$" + linkrt
+	      }}, file_path_prefix))
+
+	  # Process the records section of the yaml definition:
+	  keys_modified.extend(
+	      super().create_records(values, records, element, file_path_prefix))
+
+	  # The create_records function must return the modified keys to make it compatible
+	  # to the crawler functions:
+	  return keys_modified
+
+
+If the recommended (python) package structure is used, the package containing the converter
+definition can just be installed using `pip install .` or `pip install -e .` from the
+`scifolder_package` directory.
+
+The following yaml block will register the converter in a yaml file:
+
+.. code-block:: yaml
+
+   Converters:
+     SourceResolver:
+       package: scifolder.converters.sources
+       converter: SourceResolver
+
+
+Using the `create_records` API function
+=======================================
+
+The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is
+the recommended way to create new records from custom converters. Let's have a look at the
+function signature:
+
+.. code-block:: python
+
+    def create_records(values: GeneralStore,  # <- pass the current variables store here
+		       records: RecordStore,  # <- pass the current store of CaosDB records here
+		       def_records: dict):    # <- This is the actual definition of new records!
+
+
+`def_records` is the actual definition of new records according to the yaml cfood specification
+(work in progress, in the docs). Essentially you can do everything here, that you could do
+in the yaml document as well, but using python source code.
+
+Let's have a look at a few examples:
+
+.. code-block:: yaml
+
+  DirConverter:
+    type: Directory
+    match: (?P<dir_name>.*)
+    records:
+      Experiment:
+	identifier: $dir_name
+
+This block will just create a new record with parent `Experiment` and one property
+`identifier` with a value derived from the matching regular expression.
+
+Let's formulate that using `create_records`:
+
+.. code-block:: python
+
+  dir_name = "directory name"
+
+  record_def = {
+    "Experiment": {
+      "identifier": dir_name
+      }
+  }
+
+  keys_modified = create_records(values, records,
+				 record_def)
+
+The `dir_name` is set explicitely here, everything else is identical to the yaml statements.
+
+
+The role of `keys_modified`
+===========================
+
+You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns
+`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements:
+
+- Element 0 is the name of the record that is modified (as used in the record store `records`).
+- Element 1 is the name of the property that is modified.
+
+It is important, that the correct list of modified keys is returned by
+:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work.
+
+So, a sketch of a typical implementation within a custom converter could look like this:
+
+
+.. code-block:: python
+
+  def create_records(self, values: GeneralStore,
+		       records: RecordStore,
+		       element: StructureElement,
+		       file_path_prefix: str):
+
+    # Modify some records:
+    record_def = {
+      # ...
+    }
+
+  keys_modified = create_records(values, records,
+				 record_def)
+
+  # You can of course do it multiple times:
+  keys_modified.extend(create_records(values, records,
+				      record_def))
+
+  # You can also process the records section of the yaml definition:
+  keys_modified.extend(
+	 super().create_records(values, records, element, file_path_prefix))
+  # This essentially allows users of your converter to customize the creation of records
+  # by providing a custom "records" section additionally to the modifications provided
+  # in this implementation of the Converter.
+
+  # Important: Return the list of modified keys!
+  return keys_modified
+
+
+More complex example
+====================
+
+Let's have a look at a more complex examples, defining multiple records:
+
+.. code-block:: yaml
+
+  DirConverter:
+    type: Directory
+    match: (?P<dir_name>.*)
+    records:
+      Project:
+	identifier: project_name
+      Experiment:
+	identifier: $dir_name
+	Project: $Project
+      ProjectGroup:
+	projects: +$Project
+
+
+This block will create two new Records:
+
+- A project with a constant identifier
+- An experiment with an identifier, derived from a regular expression and a reference to the new project.
+
+Furthermore a Record `ProjectGroup` will be edited (its initial definition is not given in the
+yaml block): The project that was just created will be added as a list element to the property
+`projects`.
+
+Let's formulate that using `create_records` (again, `dir_name` is constant here):
+
+.. code-block:: python
+
+  dir_name = "directory name"
+
+  record_def = {
+    "Project": {
+      "identifier": "project_name",
+    }
+    "Experiment": {
+      "identifier": dir_name,
+      "Project": "$Project",
+      }
+    "ProjectGroup": {
+      "projects": "+$Project",
+    }
+
+  }
+
+  keys_modified = create_records(values, records,
+				 record_def)
+
+Debugging
+=========
+
+You can add the key `debug_match` to the definition of a Converter in order to create debugging
+output for the match step. The following snippet illustrates this:
+
+.. code-block:: yaml
+
+  DirConverter:
+    type: Directory
+    match: (?P<dir_name>.*)
+    debug_match: True
+    records:
+      Project:
+	identifier: project_name
+
+
+Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against
+what and what the result was.
diff --git a/src/doc/converters/further_converters.rst b/src/doc/converters/further_converters.rst
new file mode 100644
index 0000000000000000000000000000000000000000..539c5159eb1de01765a78e3c04e10fb3f0be9be5
--- /dev/null
+++ b/src/doc/converters/further_converters.rst
@@ -0,0 +1,98 @@
+Further converters
+++++++++++++++++++
+
+More converters, together with cfood definitions and examples can be found in
+the `LinkAhead Crawler Extensions Subgroup
+<https://gitlab.com/linkahead/crawler-extensions>`_ on gitlab. In the following,
+we list converters that are shipped with the crawler library itself but are not
+part of the set of standard converters and may require this library to be
+installed with additional optional dependencies.
+
+HDF5 Converters
+===============
+
+For treating `HDF5 Files
+<https://docs.hdfgroup.org/hdf5/develop/_s_p_e_c.html>`_, there are in total
+four individual converters corresponding to the internal structure of HDF5
+files: the :ref:`H5FileConverter` which opens the file itself and creates
+further structure elements from HDF5 groups, datasets, and included
+multi-dimensional arrays that are in turn treated by the
+:ref:`H5GroupConverter`, the :ref:`H5DatasetConverter`, and the
+:ref:`H5NdarrayConverter`, respectively. You need to install the LinkAhead
+crawler with its optional ``h5-crawler`` dependency for using these converters.
+
+The basic idea when crawling HDF5 files is to treat them very similar to
+:ref:`dictionaries <DictElement Converter>` in which the attributes on root,
+group, or dataset level are essentially treated like ``BooleanElement``,
+``TextElement``, ``FloatElement``, and ``IntegerElement`` in a dictionary: They
+are appended as children and can be accessed via the ``subtree``. The file
+itself and the groups within may contain further groups and datasets, which can
+have their own attributes, subgroups, and datasets, very much like
+``DictElements`` within a dictionary. The main difference to any other
+dictionary type is the presence of multi-dimensional arrays within HDF5
+datasets. Since LinkAhead doesn't have any datatype corresponding to these, and
+since it isn't desirable to store these arrays directly within LinkAhead for
+reasons of performance and of searchability, we wrap them within a specific
+Record as explained :ref:`below <H5NdarrayConverter>`, together with more
+metadata and their internal path within the HDF5 file. Users can thus query for
+datasets and their arrays according to their metadata within LinkAhead and then
+use the internal path information to access the dataset within the file
+directly. The type of this record and the property for storing the internal path
+need to be reflected in the datamodel. Using the default names, you would need a
+datamodel like
+
+.. code-block:: yaml
+
+   H5Ndarray:
+     obligatory_properties:
+       internal_hdf5-path:
+	 datatype: TEXT
+
+although the names of both property and record type can be configured within the
+cfood definition.
+
+A simple example of a cfood definition for HDF5 files can be found in the `unit
+tests
+<https://gitlab.com/linkahead/linkahead-crawler/-/blob/main/unittests/h5_cfood.yml?ref_type=heads>`_
+and shows how the individual converters are used in order to crawl a `simple
+example file
+<https://gitlab.com/linkahead/linkahead-crawler/-/blob/main/unittests/hdf5_dummy_file.hdf5?ref_type=heads>`_
+containing groups, subgroups, and datasets, together with their respective
+attributes.
+
+H5FileConverter
+---------------
+
+This is an extension of the
+:py:class:`~caoscrawler.converters.SimpleFileConverter` class. It opens the HDF5
+file and creates children for any contained group or dataset. Additionally, the
+root-level attributes of the HDF5 file are accessible as children.
+
+H5GroupConverter
+----------------
+
+This is an extension of the
+:py:class:`~caoscrawler.converters.DictElementConverter` class. Children are
+created for all subgroups and datasets in this HDF5 group. Additionally, the
+group-level attributes are accessible as children.
+
+H5DatasetConverter
+------------------
+
+This is an extension of the
+:py:class:`~caoscrawler.converters.DictElementConverter` class. Most
+importantly, it stores the array data in HDF5 dataset into
+:py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its
+children, as well as the dataset attributes.
+
+H5NdarrayConverter
+------------------
+
+This converter creates a wrapper record for the contained dataset. The name of
+this record needs to be specified in the cfood definition of this converter via
+the ``recordname`` option. The RecordType of this record can be configured with
+the ``array_recordtype_name`` option and defaults to ``H5Ndarray``. Via the
+given ``recordname``, this record can be used within the cfood. Most
+importantly, this record stores the internal path of this array within the HDF5
+file in a text property, the name of which can be configured with the
+``internal_path_property_name`` option which defaults to ``internal_hdf5_path``.
diff --git a/src/doc/converters/index.rst b/src/doc/converters/index.rst
index c81f8e0fe54ea36c92c2d97bedb40cca2ef29be0..943ff1310649aaf34738adb0d1b5e90f5a3079cc 100644
--- a/src/doc/converters/index.rst
+++ b/src/doc/converters/index.rst
@@ -15,877 +15,13 @@ a file could have the file name as property: ``'filename': myfile.dat``.
 Converters may define additional functions that create further values. For
 example, a regular expression could be used to get a date from a file name.
 
-CFood definition
-++++++++++++++++
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
 
-Converter application to data is specified via a tree-like yml file (called ``cfood.yml``, by
-convention).  The yml file specifies which Converters shall be used on which StructureElements, and
-how to treat the generated *child* StructureElements.
+   CFood definition<cfood_definition>
+   Standard converters<standard_converters>
+   Further converters<further_converters>
+   Custom converters<custom_converters>
+   Transform functions<transform_functions>
 
-The yaml definition may look like this:
-
-.. todo::
-
-  This is outdated, see ``cfood-schema.yml`` for the current specification of a ``cfood.yml``.
-
-.. code-block:: yaml
-
-    <NodeName>:
-	type: <ConverterName>
-	match: ".*"
-	records:
-	    Experiment1:
-		parents:
-		- Experiment
-		- Blablabla
-		date: $DATUM
-		(...)
-	    Experiment2:
-		parents:
-		- Experiment
-	subtree:
-	    (...)
-
-The **<NodeName>** is a description of what the current block represents (e.g.
-``experiment-folder``) and is used as an identifier.
-
-**<type>** selects the converter that is going to be matched against the current structure
-element. If the structure element matches (this is a combination of a typecheck and a detailed
-match, see the :py:class:`~caoscrawler.converters.Converter` source documentation for details), the
-converter will:
-
-- generate records (with :py:meth:`~caoscrawler.converters.Converter.create_records`)
-- possibly process a subtree (with :py:meth:`caoscrawler.converters.Converter.create_children`)
-
-**match** *TODO*
-
-**records** is a dict of definitions that define the semantic structure
-(see details below).
-
-**subtree** makes the yaml recursive: It contains a list of new Converter
-definitions, which work on the StructureElements that are returned by the
-current Converter.
-
-Transform Functions
-+++++++++++++++++++
-Often the situation arises, that you cannot use a value as it is found. Maybe a value should be
-increased by an offset or a string should be split into a list of pieces. In order to allow such
-simple conversions, transform functions can be named in the converter definition that are then
-applied to the respective variables when the converter is executed.
-
-.. code-block:: yaml
-
-    <NodeName>:
-	type: <ConverterName>
-	match: ".*"
-	transform:
-	  <TransformNodeName>:
-	    in: $<in_var_name>
-	    out: $<out_var_name>
-	    functions:
-	    - <func_name>:                         # name of the function to be applied
-		<func_arg1>: <func_arg1_value>     # key value pairs that are passed as parameters
-		<func_arg2>: <func_arg2_value>
-		# ...
-
-An example that splits the variable ``a`` and puts the generated list in ``b`` is the following:
-
-.. code-block:: yaml
-
-    Experiment:
-	type: Dict
-	match: ".*"
-	transform:
-	  param_split:
-	    in: $a
-	    out: $b
-	    functions:
-	    - split:            # split is a function that is defined by default
-		marker: "|"     # its only parameter is the marker that is used to split the string
-	records:
-	  Report:
-	    tags: $b
-
-This splits the string in '$a' and stores the resulting list in '$b'. This is here used to add a
-list valued property to the Report Record.
-
-
-There are a number of transform functions that are defined by default (see
-``src/caoscrawler/default_transformers.yml``). You can define custom transform functions by adding
-them to the cfood definition (see :doc:`CFood Documentation<../cfood>`).
-
-
-Standard Converters
-+++++++++++++++++++
-
-These are the standard converters that exist in a default installation.  For writing and applying
-*custom converters*, see :ref:`below <Custom Converters>`.
-
-Directory Converter
-===================
-The Directory Converter creates StructureElements for each File and Directory
-inside the current Directory. You can match a regular expression against the
-directory name using the 'match' key.
-
-Simple File Converter
-=====================
-The Simple File Converter does not create any children and is usually used if
-a file shall be used as it is and be inserted and referenced by other entities.
-
-Markdown File Converter
-=======================
-Reads a YAML header from Markdown files (if such a header exists) and creates
-children elements according to the structure of the header.
-
-DictElement Converter
-=====================
-
-DictElement → StructureElement
-
-Creates a child StructureElement for each key in the dictionary.
-
-Typical Subtree converters
---------------------------
-The following StructureElement types are typically created by the DictElement converter:
-
-- BooleanElement
-- FloatElement
-- TextElement
-- IntegerElement
-- ListElement
-- DictElement
-
-Note that you may use ``TextElement`` for anything that exists in a text format that can be
-interpreted by the server, such as date and datetime strings in ISO-8601 format.
-
-Scalar Value Converters
-=======================
-`BooleanElementConverter`, `FloatElementConverter`, `TextElementConverter`,  and
-`IntegerElementConverter` behave very similarly.
-
-These converters expect `match_name` and `match_value` in their definition
-which allow to match the key and the value, respectively.
-
-Note that there are defaults for accepting other types. For example,
-FloatElementConverter also accepts IntegerElements. The default
-behavior can be adjusted with the fields `accept_text`, `accept_int`,
-`accept_float`, and `accept_bool`.
-
-The following denotes what kind of StructureElements are accepted by default
-(they are defined in `src/caoscrawler/converters.py`):
-
-- BooleanElementConverter: bool, int
-- FloatElementConverter: int, float
-- TextElementConverter: text, bool, int, float
-- IntegerElementConverter: int
-- ListElementConverter: list
-- DictElementConverter: dict
-
-YAMLFileConverter
-=================
-
-A specialized Dict Converter for yaml files: Yaml files are opened and the contents are
-converted into dictionaries that can be further converted using the typical subtree converters
-of dict converter.
-
-**WARNING**: Currently unfinished implementation.
-
-JSONFileConverter
-=================
-
-
-
-
-TableConverter
-==============
-
-Table → DictElement
-
-A generic converter (abstract) for files containing tables.
-Currently, there are two specialized implementations for XLSX files and CSV files.
-
-All table converters generate a subtree of dicts, which in turn can be converted with DictElementConverters:
-For each row in the table the TableConverter generates a DictElement (structure element). The key of the
-element is the row number. The value of the element is a dict containing the mapping of
-column names to values of the respective cell.
-
-Example:
-
-.. code-block:: yaml
-
-   subtree:
-     TABLE:  # Any name for the table as a whole
-       type: CSVTableConverter
-       match: ^test_table.csv$
-       records:
-	 (...)  # Records edited for the whole table file
-       subtree:
-	 ROW:  # Any name for a data row in the table
-	   type: DictElement
-	   match_name: .*
-	   match_value: .*
-	   records:
-	     (...)  # Records edited for each row
-	   subtree:
-	     COLUMN:  # Any name for a specific type of column in the table
-	       type: FloatElement
-	       match_name: measurement  # Name of the column in the table file
-	       match_value: (?P<column_value).*)
-	       records:
-		 (...)  # Records edited for each cell
-
-
-XLSXTableConverter
-==================
-
-XLSX File → DictElement
-
-CSVTableConverter
-=================
-
-CSV File → DictElement
-
-PropertiesFromDictConverter
-===========================
-
-The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is
-a specialization of the
-:py:class:`~caoscrawler.converters.DictElementConverter` and offers
-all its functionality. It is meant to operate on dictionaries (e.g.,
-from reading in a json or a table file), the keys of which correspond
-closely to properties in a LinkAhead datamodel. This is especially
-handy in cases where properties may be added to the data model and
-data sources that are not yet known when writing the cfood definition.
-
-The converter definition of the
-:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an
-additional required entry ``record_from_dict`` which specifies the
-Record to which the properties extracted from the dict are attached
-to. This Record is identified by its ``variable_name`` by which it can
-be referred to further down the subtree. You can also use the name of
-a Record that was specified earlier in the CFood definition in order
-to extend it by the properties extracted from a dict. Let's have a
-look at a simple example. A CFood definition
-
-.. code-block:: yaml
-
-   PropertiesFromDictElement:
-       type: PropertiesFromDictElement
-       match: ".*"
-       record_from_dict:
-	   variable_name: MyRec
-	   parents:
-	   - MyType1
-	   - MyType2
-
-applied to a dictionary
-
-.. code-block:: json
-
-   {
-     "name": "New name",
-     "a": 5,
-     "b": ["a", "b", "c"],
-     "author": {
-       "full_name": "Silvia Scientist"
-     }
-   }
-
-will create a Record ``New name`` with parents ``MyType1`` and
-``MyType2``. It has a scalar property ``a`` with value 5, a list
-property ``b`` with values "a", "b" and "c", and an ``author``
-property which references an ``author`` with a ``full_name`` property
-with value "Silvia Scientist":
-
-.. image:: ../img/properties-from-dict-records-author.png
-  :height: 210
-
-Note how the different dictionary keys are handled differently
-depending on their types: scalar and list values are understood
-automatically, and a dictionary-valued entry like ``author`` is
-translated into a reference to an ``author`` Record automatically.
-
-You can further specify how references are treated with an optional
-``references key`` in ``record_from_dict``. Let's assume that in the
-above example, we have an ``author`` **Property** with datatype
-``Person`` in our data model. We could add this information by
-extending the above example definition by
-
-
-.. code-block:: yaml
-
-   PropertiesFromDictElement:
-       type: PropertiesFromDictElement
-       match: ".*"
-       record_from_dict:
-	   variable_name: MyRec
-	   parents:
-	   - MyType1
-	   - MyType2
-	   references:
-	       author:
-		   parents:
-		   - Person
-
-so that now, a ``Person`` record with a ``full_name`` property with
-value "Silvia Scientist" is created as the value of the ``author``
-property:
-
-.. image:: ../img/properties-from-dict-records-person.png
-  :height: 200
-
-For the time being, only the parents of the referenced record can be
-set via this option. More complicated treatments can be implemented
-via the ``referenced_record_callback`` (see below).
-
-Properties can be blacklisted with the ``properties_blacklist``
-keyword, i.e., all keys listed under ``properties_blacklist`` will be
-excluded from automated treatment. Since the
-:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has
-all the functionality of the
-:py:class:`~caoscrawler.converters.DictElementConverter`, individual
-properties can still be used in a subtree. Together with
-``properties_blacklist`` this can be used to add custom treatment to
-specific properties by blacklisting them in ``record_from_dict`` and
-then treating them in the subtree the same as you would do it in the
-standard
-:py:class:`~caoscrawler.converters.DictElementConverter`. Note that
-the blacklisted keys are excluded on **all** levels of the dictionary,
-i.e., also when they occur in a referenced entity.
-
-For further customization, the
-:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` can be
-used as a basis for :ref:`custom converters<Custom Converters>` which
-can make use of its ``referenced_record_callback`` argument. The
-``referenced_record_callback`` can be a callable object which takes
-exactly a Record as an argument and needs to return that Record after
-doing whatever custom treatment is needed. Additionally, it is given
-the ``RecordStore`` and the ``ValueStore`` in order to be able to
-access the records and values that have already been defined from
-within ``referenced_record_callback``. Such a function might look the
-following:
-
-.. code-block:: python
-
-   def my_callback(rec: db.Record, records: RecordStore, values: GeneralStore):
-       # do something with rec, possibly using other records or values from the stores...
-       rec.description = "This was updated in a callback"
-       return rec
-
-It is applied to all Records that are created from the dictionary and
-it can be used to, e.g., transform values of some properties, or add
-special treatment to all Records of a specific
-type. ``referenced_record_callback`` is applied **after** the
-properties from the dictionary have been applied as explained above.
-
-XML Converters
-==============
-
-There are the following converters for XML content:
-
-
-XMLFileConverter
-----------------
-
-This is a converter that loads an XML file and creates an XMLElement containing the
-root element of the XML tree. It can be matched in the subtree using the XMLTagConverter.
-
-XMLTagConverter
----------------
-
-The XMLTagConverter is a generic converter for XMLElements with the following main features:
-
-- It allows to match a combination of tag name, attribute names and text contents using the keys:
-
-  - ``match_tag``: regexp, default empty string
-  - ``match_attrib``: dictionary of key-regexps and value-regexp
-    pairs. Each key matches an attribute name and the corresponding
-    value matches its attribute value.
-  - ``match_text``: regexp, default empty string
-- It allows to traverse the tree using XPath (using Python lxml's xpath functions):
-
-  - The key ``xpath`` is used to set the xpath expression and has a
-    default of ``child::*``. Its default would generate just the list of
-    sub nodes of the current node. The result of the xpath expression
-    is used to generate structure elements as children. It furthermore
-    uses the keys ``tags_as_children``, ``attribs_as_children`` and
-    ``text_as_children`` to decide which information from the found
-    nodes will be used as children:
-  - ``tags_as_children``: (default ``true``) For each xml tag element
-    found by the xpath expression, generate one XMLTag structure
-    element. Its name is the full path to the tag using the function
-    ``getelementpath`` from ``lxml``.
-  - ``attribs_as_children``: (default ``false``) For each xml tag element
-    found by the xpath expression, generate one XMLAttributeNode
-    structure element for each of its attributes. The name of the
-    respective attribute node has the form: ``<full path of the tag> @
-    <name of the attribute>`` **Please note:** Currently, there is no
-    converter implemented that can match XMLAttributeNodes.
-  - ``text_as_children``: (default ``false``) For each xml tag element
-    found by the xpath expression, generate one XMLTextNode structure
-    element containing the text content of the tag element. Note that
-    in case of multiple text elements, only the first one is
-    added. The name of the respective attribute node has the form:
-    ``<full path of the tag> /text()`` to the tag using the function
-    ``getelementpath`` from ``lxml``. **Please note:** Currently, there is
-    no converter implemented that can match XMLAttributeNodes.
-
-Namespaces
-**********
-
-The default is to take the namespace map from the current node and use
-it in xpath queries. Because default namespaces cannot be handled by
-xpath, it is possible to remap the default namespace using the key
-``default_namespace``. The key ``nsmap`` can be used to define
-additional nsmap entries.
-
-XMLTextNodeConverter
---------------------
-
-In the future, this converter can be used to match XMLTextNodes that
-are generated by the XMLTagConverter.
-
-
-Further converters
-++++++++++++++++++
-
-More converters, together with cfood definitions and examples can be found in
-the `LinkAhead Crawler Extensions Subgroup
-<https://gitlab.com/linkahead/crawler-extensions>`_ on gitlab. In the following,
-we list converters that are shipped with the crawler library itself but are not
-part of the set of standard converters and may require this library to be
-installed with additional optional dependencies.
-
-HDF5 Converters
-===============
-
-For treating `HDF5 Files
-<https://docs.hdfgroup.org/hdf5/develop/_s_p_e_c.html>`_, there are in total
-four individual converters corresponding to the internal structure of HDF5
-files: the :ref:`H5FileConverter` which opens the file itself and creates
-further structure elements from HDF5 groups, datasets, and included
-multi-dimensional arrays that are in turn treated by the
-:ref:`H5GroupConverter`, the :ref:`H5DatasetConverter`, and the
-:ref:`H5NdarrayConverter`, respectively. You need to install the LinkAhead
-crawler with its optional ``h5-crawler`` dependency for using these converters.
-
-The basic idea when crawling HDF5 files is to treat them very similar to
-:ref:`dictionaries <DictElement Converter>` in which the attributes on root,
-group, or dataset level are essentially treated like ``BooleanElement``,
-``TextElement``, ``FloatElement``, and ``IntegerElement`` in a dictionary: They
-are appended as children and can be accessed via the ``subtree``. The file
-itself and the groups within may contain further groups and datasets, which can
-have their own attributes, subgroups, and datasets, very much like
-``DictElements`` within a dictionary. The main difference to any other
-dictionary type is the presence of multi-dimensional arrays within HDF5
-datasets. Since LinkAhead doesn't have any datatype corresponding to these, and
-since it isn't desirable to store these arrays directly within LinkAhead for
-reasons of performance and of searchability, we wrap them within a specific
-Record as explained :ref:`below <H5NdarrayConverter>`, together with more
-metadata and their internal path within the HDF5 file. Users can thus query for
-datasets and their arrays according to their metadata within LinkAhead and then
-use the internal path information to access the dataset within the file
-directly. The type of this record and the property for storing the internal path
-need to be reflected in the datamodel. Using the default names, you would need a
-datamodel like
-
-.. code-block:: yaml
-
-   H5Ndarray:
-     obligatory_properties:
-       internal_hdf5-path:
-	 datatype: TEXT
-
-although the names of both property and record type can be configured within the
-cfood definition.
-
-A simple example of a cfood definition for HDF5 files can be found in the `unit
-tests
-<https://gitlab.com/linkahead/linkahead-crawler/-/blob/main/unittests/h5_cfood.yml?ref_type=heads>`_
-and shows how the individual converters are used in order to crawl a `simple
-example file
-<https://gitlab.com/linkahead/linkahead-crawler/-/blob/main/unittests/hdf5_dummy_file.hdf5?ref_type=heads>`_
-containing groups, subgroups, and datasets, together with their respective
-attributes.
-
-H5FileConverter
----------------
-
-This is an extension of the
-:py:class:`~caoscrawler.converters.SimpleFileConverter` class. It opens the HDF5
-file and creates children for any contained group or dataset. Additionally, the
-root-level attributes of the HDF5 file are accessible as children.
-
-H5GroupConverter
-----------------
-
-This is an extension of the
-:py:class:`~caoscrawler.converters.DictElementConverter` class. Children are
-created for all subgroups and datasets in this HDF5 group. Additionally, the
-group-level attributes are accessible as children.
-
-H5DatasetConverter
-------------------
-
-This is an extension of the
-:py:class:`~caoscrawler.converters.DictElementConverter` class. Most
-importantly, it stores the array data in HDF5 dataset into
-:py:class:`~caoscrawler.hdf5_converter.H5NdarrayElement` which is added to its
-children, as well as the dataset attributes.
-
-H5NdarrayConverter
-------------------
-
-This converter creates a wrapper record for the contained dataset. The name of
-this record needs to be specified in the cfood definition of this converter via
-the ``recordname`` option. The RecordType of this record can be configured with
-the ``array_recordtype_name`` option and defaults to ``H5Ndarray``. Via the
-given ``recordname``, this record can be used within the cfood. Most
-importantly, this record stores the internal path of this array within the HDF5
-file in a text property, the name of which can be configured with the
-``internal_path_property_name`` option which defaults to ``internal_hdf5_path``.
-
-Custom Converters
-+++++++++++++++++
-
-As mentioned before it is possible to create custom converters.
-These custom converters can be used to integrate arbitrary data extraction and ETL capabilities
-into the LinkAhead crawler and make these extensions available to any yaml specification.
-
-Tell the crawler about a custom converter
-=========================================
-
-To use a custom crawler, it must be defined in the ``Converters`` section of the CFood yaml file.
-The basic syntax for adding a custom converter to a definition file is:
-
-.. code-block:: yaml
-
-   Converters:
-     <NameOfTheConverterInYamlFile>:
-       package: <python>.<module>.<name>
-       converter: <PythonClassName>
-
-The Converters section can be either put into the first or the second
-document of the cfood yaml file. It can be also part of a
-single-document yaml cfood file. Please refer to :doc:`the cfood
-documentation<../cfood>` for more details.
-
-Details:
-
-- **<NameOfTheConverterInYamlFile>**: This is the name of the converter as it is going to be used in the present yaml file.
-- **<python>.<module>.<name>**: The name of the module where the converter class resides.
-- **<PythonClassName>**: Within this specified module there must be a class inheriting from base class :py:class:`caoscrawler.converters.Converter`.
-
-Implementing a custom converter
-===============================
-
-Converters inherit from the :py:class:`~caoscrawler.converters.Converter` class.
-
-The following methods are abstract and need to be overwritten by your custom converter to make it work:
-
-:py:meth:`~caoscrawler.converters.Converter.create_children`:
-    Return a list of child StructureElement objects.
-
-- :py:meth:`~caoscrawler.converters.Converter.match`
-- :py:meth:`~caoscrawler.converters.Converter.typecheck`
-
-
-Example
-=======
-
-In the following, we will explain the process of adding a custom converter to a yaml file using
-a SourceResolver that is able to attach a source element to another entity.
-
-**Note**: This example might become a standard crawler soon, as part of the scifolder specification. See https://doi.org/10.3390/data5020043 for details. In this documentation example we will, therefore, add it to a package called "scifolder".
-
-First we will create our package and module structure, which might be:
-
-.. code-block::
-
-   scifolder_package/
-     README.md
-     setup.cfg
-     setup.py
-     Makefile
-     tox.ini
-     src/
-       scifolder/
-	 __init__.py
-	 converters/
-	   __init__.py
-	   sources.py  # <- the actual file containing
-		       #    the converter class
-     doc/
-     unittests/
-
-Now we need to create a class called "SourceResolver" in the file "sources.py". In this - more advanced - example, we will not inherit our converter directly from :py:class:`~caoscrawler.converters.Converter`, but use :py:class:`~caoscrawler.converters.TextElementConverter`. The latter already implements :py:meth:`~caoscrawler.converters.Converter.match` and :py:meth:`~caoscrawler.converters.Converter.typecheck`, so only an implementation for :py:meth:`~caoscrawler.converters.Converter.create_children` has to be provided by us.
-Furthermore we will customize the method :py:meth:`~caoscrawler.converters.Converter.create_records` that allows us to specify a more complex record generation procedure than provided in the standard implementation. One specific limitation of the standard implementation is, that only a fixed
-number of records can be generated by the yaml definition. So for any applications - like here - that require an arbitrary number of records to be created, a customized implementation of :py:meth:`~caoscrawler.converters.Converter.create_records` is recommended.
-In this context it is recommended to make use of the function :func:`caoscrawler.converters.create_records` that implements creation of record objects from python dictionaries of the same structure
-that would be given using a yaml definition (see next section below).
-
-.. code-block:: python
-
-    import re
-    from caoscrawler.stores import GeneralStore, RecordStore
-    from caoscrawler.converters import TextElementConverter, create_records
-    from caoscrawler.structure_elements import StructureElement, TextElement
-
-
-    class SourceResolver(TextElementConverter):
-      """
-      This resolver uses a source list element (e.g. from the markdown readme file)
-      to link sources correctly.
-      """
-
-      def __init__(self, definition: dict, name: str,
-		   converter_registry: dict):
-	  """
-	  Initialize a new directory converter.
-	  """
-	  super().__init__(definition, name, converter_registry)
-
-      def create_children(self, generalStore: GeneralStore,
-				element: StructureElement):
-
-	  # The source resolver does not create children:
-
-	  return []
-
-      def create_records(self, values: GeneralStore,
-			 records: RecordStore,
-			 element: StructureElement,
-			 file_path_prefix):
-	  if not isinstance(element, TextElement):
-	      raise RuntimeError()
-
-	  # This function must return a list containing tuples, each one for a modified
-	  # property: (name_of_entity, name_of_property)
-	  keys_modified = []
-
-	  # This is the name of the entity where the source is going to be attached:
-	  attach_to_scientific_activity = self.definition["scientific_activity"]
-	  rec = records[attach_to_scientific_activity]
-
-	  # The "source" is a path to a source project, so it should have the form:
-	  # /<Category>/<project>/<scientific_activity>/
-	  # obtain these information from the structure element:
-	  val = element.value
-	  regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
-		    '/(?P<project_date>.*?)_(?P<project_identifier>.*)'
-		    '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
-
-	  res = re.match(regexp, val)
-	  if res is None:
-	      raise RuntimeError("Source cannot be parsed correctly.")
-
-	  # Mapping of categories on the file system to corresponding record types in CaosDB:
-	  cat_map = {
-	      "SimulationData": "Simulation",
-	      "ExperimentalData": "Experiment",
-	      "DataAnalysis": "DataAnalysis"}
-	  linkrt = cat_map[res.group("category")]
-
-	  keys_modified.extend(create_records(values, records, {
-	      "Project": {
-		  "date": res.group("project_date"),
-		  "identifier": res.group("project_identifier"),
-	      },
-	      linkrt: {
-		  "date": res.group("date"),
-		  "identifier": res.group("identifier"),
-		  "project": "$Project"
-	      },
-	      attach_to_scientific_activity: {
-		  "sources": "+$" + linkrt
-	      }}, file_path_prefix))
-
-	  # Process the records section of the yaml definition:
-	  keys_modified.extend(
-	      super().create_records(values, records, element, file_path_prefix))
-
-	  # The create_records function must return the modified keys to make it compatible
-	  # to the crawler functions:
-	  return keys_modified
-
-
-If the recommended (python) package structure is used, the package containing the converter
-definition can just be installed using `pip install .` or `pip install -e .` from the
-`scifolder_package` directory.
-
-The following yaml block will register the converter in a yaml file:
-
-.. code-block:: yaml
-
-   Converters:
-     SourceResolver:
-       package: scifolder.converters.sources
-       converter: SourceResolver
-
-
-Using the `create_records` API function
-=======================================
-
-The function :func:`caoscrawler.converters.create_records` was already mentioned above and it is
-the recommended way to create new records from custom converters. Let's have a look at the
-function signature:
-
-.. code-block:: python
-
-    def create_records(values: GeneralStore,  # <- pass the current variables store here
-		       records: RecordStore,  # <- pass the current store of CaosDB records here
-		       def_records: dict):    # <- This is the actual definition of new records!
-
-
-`def_records` is the actual definition of new records according to the yaml cfood specification
-(work in progress, in the docs). Essentially you can do everything here, that you could do
-in the yaml document as well, but using python source code.
-
-Let's have a look at a few examples:
-
-.. code-block:: yaml
-
-  DirConverter:
-    type: Directory
-    match: (?P<dir_name>.*)
-    records:
-      Experiment:
-	identifier: $dir_name
-
-This block will just create a new record with parent `Experiment` and one property
-`identifier` with a value derived from the matching regular expression.
-
-Let's formulate that using `create_records`:
-
-.. code-block:: python
-
-  dir_name = "directory name"
-
-  record_def = {
-    "Experiment": {
-      "identifier": dir_name
-      }
-  }
-
-  keys_modified = create_records(values, records,
-				 record_def)
-
-The `dir_name` is set explicitely here, everything else is identical to the yaml statements.
-
-
-The role of `keys_modified`
-===========================
-
-You probably have noticed already, that :func:`caoscrawler.converters.create_records` returns
-`keys_modified` which is a list of tuples. Each element of `keys_modified` has two elements:
-
-- Element 0 is the name of the record that is modified (as used in the record store `records`).
-- Element 1 is the name of the property that is modified.
-
-It is important, that the correct list of modified keys is returned by
-:py:meth:`~caoscrawler.converters.Converter.create_records` to make the crawler process work.
-
-So, a sketch of a typical implementation within a custom converter could look like this:
-
-
-.. code-block:: python
-
-  def create_records(self, values: GeneralStore,
-		       records: RecordStore,
-		       element: StructureElement,
-		       file_path_prefix: str):
-
-    # Modify some records:
-    record_def = {
-      # ...
-    }
-
-  keys_modified = create_records(values, records,
-				 record_def)
-
-  # You can of course do it multiple times:
-  keys_modified.extend(create_records(values, records,
-				      record_def))
-
-  # You can also process the records section of the yaml definition:
-  keys_modified.extend(
-	 super().create_records(values, records, element, file_path_prefix))
-  # This essentially allows users of your converter to customize the creation of records
-  # by providing a custom "records" section additionally to the modifications provided
-  # in this implementation of the Converter.
-
-  # Important: Return the list of modified keys!
-  return keys_modified
-
-
-More complex example
-====================
-
-Let's have a look at a more complex examples, defining multiple records:
-
-.. code-block:: yaml
-
-  DirConverter:
-    type: Directory
-    match: (?P<dir_name>.*)
-    records:
-      Project:
-	identifier: project_name
-      Experiment:
-	identifier: $dir_name
-	Project: $Project
-      ProjectGroup:
-	projects: +$Project
-
-
-This block will create two new Records:
-
-- A project with a constant identifier
-- An experiment with an identifier, derived from a regular expression and a reference to the new project.
-
-Furthermore a Record `ProjectGroup` will be edited (its initial definition is not given in the
-yaml block): The project that was just created will be added as a list element to the property
-`projects`.
-
-Let's formulate that using `create_records` (again, `dir_name` is constant here):
-
-.. code-block:: python
-
-  dir_name = "directory name"
-
-  record_def = {
-    "Project": {
-      "identifier": "project_name",
-    }
-    "Experiment": {
-      "identifier": dir_name,
-      "Project": "$Project",
-      }
-    "ProjectGroup": {
-      "projects": "+$Project",
-    }
-
-  }
-
-  keys_modified = create_records(values, records,
-				 record_def)
-
-Debugging
-=========
-
-You can add the key `debug_match` to the definition of a Converter in order to create debugging
-output for the match step. The following snippet illustrates this:
-
-.. code-block:: yaml
-
-  DirConverter:
-    type: Directory
-    match: (?P<dir_name>.*)
-    debug_match: True
-    records:
-      Project:
-	identifier: project_name
-
-
-Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against
-what and what the result was.
diff --git a/src/doc/converters/standard_converters.rst b/src/doc/converters/standard_converters.rst
new file mode 100644
index 0000000000000000000000000000000000000000..7eb8de681eb4c026e3175693ada828f3ca6ce96f
--- /dev/null
+++ b/src/doc/converters/standard_converters.rst
@@ -0,0 +1,329 @@
+Standard Converters
++++++++++++++++++++
+
+These are the standard converters that exist in a default installation.  For writing and applying
+*custom converters*, see :ref:`below <Custom Converters>`.
+
+Directory Converter
+===================
+The Directory Converter creates StructureElements for each File and Directory
+inside the current Directory. You can match a regular expression against the
+directory name using the 'match' key.
+
+Simple File Converter
+=====================
+The Simple File Converter does not create any children and is usually used if
+a file shall be used as it is and be inserted and referenced by other entities.
+
+Markdown File Converter
+=======================
+Reads a YAML header from Markdown files (if such a header exists) and creates
+children elements according to the structure of the header.
+
+DictElement Converter
+=====================
+
+DictElement → StructureElement
+
+Creates a child StructureElement for each key in the dictionary.
+
+Typical Subtree converters
+--------------------------
+The following StructureElement types are typically created by the DictElement converter:
+
+- BooleanElement
+- FloatElement
+- TextElement
+- IntegerElement
+- ListElement
+- DictElement
+
+Note that you may use ``TextElement`` for anything that exists in a text format that can be
+interpreted by the server, such as date and datetime strings in ISO-8601 format.
+
+Scalar Value Converters
+=======================
+`BooleanElementConverter`, `FloatElementConverter`, `TextElementConverter`,  and
+`IntegerElementConverter` behave very similarly.
+
+These converters expect `match_name` and `match_value` in their definition
+which allow to match the key and the value, respectively.
+
+Note that there are defaults for accepting other types. For example,
+FloatElementConverter also accepts IntegerElements. The default
+behavior can be adjusted with the fields `accept_text`, `accept_int`,
+`accept_float`, and `accept_bool`.
+
+The following denotes what kind of StructureElements are accepted by default
+(they are defined in `src/caoscrawler/converters.py`):
+
+- BooleanElementConverter: bool, int
+- FloatElementConverter: int, float
+- TextElementConverter: text, bool, int, float
+- IntegerElementConverter: int
+- ListElementConverter: list
+- DictElementConverter: dict
+
+YAMLFileConverter
+=================
+
+A specialized Dict Converter for yaml files: Yaml files are opened and the contents are
+converted into dictionaries that can be further converted using the typical subtree converters
+of dict converter.
+
+**WARNING**: Currently unfinished implementation.
+
+JSONFileConverter
+=================
+
+
+
+
+TableConverter
+==============
+
+Table → DictElement
+
+A generic converter (abstract) for files containing tables.
+Currently, there are two specialized implementations for XLSX files and CSV files.
+
+All table converters generate a subtree of dicts, which in turn can be converted with DictElementConverters:
+For each row in the table the TableConverter generates a DictElement (structure element). The key of the
+element is the row number. The value of the element is a dict containing the mapping of
+column names to values of the respective cell.
+
+Example:
+
+.. code-block:: yaml
+
+   subtree:
+     TABLE:  # Any name for the table as a whole
+       type: CSVTableConverter
+       match: ^test_table.csv$
+       records:
+	 (...)  # Records edited for the whole table file
+       subtree:
+	 ROW:  # Any name for a data row in the table
+	   type: DictElement
+	   match_name: .*
+	   match_value: .*
+	   records:
+	     (...)  # Records edited for each row
+	   subtree:
+	     COLUMN:  # Any name for a specific type of column in the table
+	       type: FloatElement
+	       match_name: measurement  # Name of the column in the table file
+	       match_value: (?P<column_value).*)
+	       records:
+		 (...)  # Records edited for each cell
+
+
+XLSXTableConverter
+==================
+
+XLSX File → DictElement
+
+CSVTableConverter
+=================
+
+CSV File → DictElement
+
+PropertiesFromDictConverter
+===========================
+
+The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is
+a specialization of the
+:py:class:`~caoscrawler.converters.DictElementConverter` and offers
+all its functionality. It is meant to operate on dictionaries (e.g.,
+from reading in a json or a table file), the keys of which correspond
+closely to properties in a LinkAhead datamodel. This is especially
+handy in cases where properties may be added to the data model and
+data sources that are not yet known when writing the cfood definition.
+
+The converter definition of the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an
+additional required entry ``record_from_dict`` which specifies the
+Record to which the properties extracted from the dict are attached
+to. This Record is identified by its ``variable_name`` by which it can
+be referred to further down the subtree. You can also use the name of
+a Record that was specified earlier in the CFood definition in order
+to extend it by the properties extracted from a dict. Let's have a
+look at a simple example. A CFood definition
+
+.. code-block:: yaml
+
+   PropertiesFromDictElement:
+       type: PropertiesFromDictElement
+       match: ".*"
+       record_from_dict:
+	   variable_name: MyRec
+	   parents:
+	   - MyType1
+	   - MyType2
+
+applied to a dictionary
+
+.. code-block:: json
+
+   {
+     "name": "New name",
+     "a": 5,
+     "b": ["a", "b", "c"],
+     "author": {
+       "full_name": "Silvia Scientist"
+     }
+   }
+
+will create a Record ``New name`` with parents ``MyType1`` and
+``MyType2``. It has a scalar property ``a`` with value 5, a list
+property ``b`` with values "a", "b" and "c", and an ``author``
+property which references an ``author`` with a ``full_name`` property
+with value "Silvia Scientist":
+
+.. image:: ../img/properties-from-dict-records-author.png
+  :height: 210
+
+Note how the different dictionary keys are handled differently
+depending on their types: scalar and list values are understood
+automatically, and a dictionary-valued entry like ``author`` is
+translated into a reference to an ``author`` Record automatically.
+
+You can further specify how references are treated with an optional
+``references key`` in ``record_from_dict``. Let's assume that in the
+above example, we have an ``author`` **Property** with datatype
+``Person`` in our data model. We could add this information by
+extending the above example definition by
+
+
+.. code-block:: yaml
+
+   PropertiesFromDictElement:
+       type: PropertiesFromDictElement
+       match: ".*"
+       record_from_dict:
+	   variable_name: MyRec
+	   parents:
+	   - MyType1
+	   - MyType2
+	   references:
+	       author:
+		   parents:
+		   - Person
+
+so that now, a ``Person`` record with a ``full_name`` property with
+value "Silvia Scientist" is created as the value of the ``author``
+property:
+
+.. image:: ../img/properties-from-dict-records-person.png
+  :height: 200
+
+For the time being, only the parents of the referenced record can be
+set via this option. More complicated treatments can be implemented
+via the ``referenced_record_callback`` (see below).
+
+Properties can be blacklisted with the ``properties_blacklist``
+keyword, i.e., all keys listed under ``properties_blacklist`` will be
+excluded from automated treatment. Since the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has
+all the functionality of the
+:py:class:`~caoscrawler.converters.DictElementConverter`, individual
+properties can still be used in a subtree. Together with
+``properties_blacklist`` this can be used to add custom treatment to
+specific properties by blacklisting them in ``record_from_dict`` and
+then treating them in the subtree the same as you would do it in the
+standard
+:py:class:`~caoscrawler.converters.DictElementConverter`. Note that
+the blacklisted keys are excluded on **all** levels of the dictionary,
+i.e., also when they occur in a referenced entity.
+
+For further customization, the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` can be
+used as a basis for :ref:`custom converters<Custom Converters>` which
+can make use of its ``referenced_record_callback`` argument. The
+``referenced_record_callback`` can be a callable object which takes
+exactly a Record as an argument and needs to return that Record after
+doing whatever custom treatment is needed. Additionally, it is given
+the ``RecordStore`` and the ``ValueStore`` in order to be able to
+access the records and values that have already been defined from
+within ``referenced_record_callback``. Such a function might look the
+following:
+
+.. code-block:: python
+
+   def my_callback(rec: db.Record, records: RecordStore, values: GeneralStore):
+       # do something with rec, possibly using other records or values from the stores...
+       rec.description = "This was updated in a callback"
+       return rec
+
+It is applied to all Records that are created from the dictionary and
+it can be used to, e.g., transform values of some properties, or add
+special treatment to all Records of a specific
+type. ``referenced_record_callback`` is applied **after** the
+properties from the dictionary have been applied as explained above.
+
+XML Converters
+==============
+
+There are the following converters for XML content:
+
+
+XMLFileConverter
+----------------
+
+This is a converter that loads an XML file and creates an XMLElement containing the
+root element of the XML tree. It can be matched in the subtree using the XMLTagConverter.
+
+XMLTagConverter
+---------------
+
+The XMLTagConverter is a generic converter for XMLElements with the following main features:
+
+- It allows to match a combination of tag name, attribute names and text contents using the keys:
+
+  - ``match_tag``: regexp, default empty string
+  - ``match_attrib``: dictionary of key-regexps and value-regexp
+    pairs. Each key matches an attribute name and the corresponding
+    value matches its attribute value.
+  - ``match_text``: regexp, default empty string
+- It allows to traverse the tree using XPath (using Python lxml's xpath functions):
+
+  - The key ``xpath`` is used to set the xpath expression and has a
+    default of ``child::*``. Its default would generate just the list of
+    sub nodes of the current node. The result of the xpath expression
+    is used to generate structure elements as children. It furthermore
+    uses the keys ``tags_as_children``, ``attribs_as_children`` and
+    ``text_as_children`` to decide which information from the found
+    nodes will be used as children:
+  - ``tags_as_children``: (default ``true``) For each xml tag element
+    found by the xpath expression, generate one XMLTag structure
+    element. Its name is the full path to the tag using the function
+    ``getelementpath`` from ``lxml``.
+  - ``attribs_as_children``: (default ``false``) For each xml tag element
+    found by the xpath expression, generate one XMLAttributeNode
+    structure element for each of its attributes. The name of the
+    respective attribute node has the form: ``<full path of the tag> @
+    <name of the attribute>`` **Please note:** Currently, there is no
+    converter implemented that can match XMLAttributeNodes.
+  - ``text_as_children``: (default ``false``) For each xml tag element
+    found by the xpath expression, generate one XMLTextNode structure
+    element containing the text content of the tag element. Note that
+    in case of multiple text elements, only the first one is
+    added. The name of the respective attribute node has the form:
+    ``<full path of the tag> /text()`` to the tag using the function
+    ``getelementpath`` from ``lxml``. **Please note:** Currently, there is
+    no converter implemented that can match XMLAttributeNodes.
+
+Namespaces
+**********
+
+The default is to take the namespace map from the current node and use
+it in xpath queries. Because default namespaces cannot be handled by
+xpath, it is possible to remap the default namespace using the key
+``default_namespace``. The key ``nsmap`` can be used to define
+additional nsmap entries.
+
+XMLTextNodeConverter
+--------------------
+
+In the future, this converter can be used to match XMLTextNodes that
+are generated by the XMLTagConverter.
diff --git a/src/doc/converters/transform_functions.rst b/src/doc/converters/transform_functions.rst
new file mode 100644
index 0000000000000000000000000000000000000000..22df35c8521ea0d70b2ebf7b7c8bc7c52e176bd3
--- /dev/null
+++ b/src/doc/converters/transform_functions.rst
@@ -0,0 +1,47 @@
+Transform Functions
++++++++++++++++++++
+Often the situation arises, that you cannot use a value as it is found. Maybe a value should be
+increased by an offset or a string should be split into a list of pieces. In order to allow such
+simple conversions, transform functions can be named in the converter definition that are then
+applied to the respective variables when the converter is executed.
+
+.. code-block:: yaml
+
+    <NodeName>:
+	type: <ConverterName>
+	match: ".*"
+	transform:
+	  <TransformNodeName>:
+	    in: $<in_var_name>
+	    out: $<out_var_name>
+	    functions:
+	    - <func_name>:                         # name of the function to be applied
+		<func_arg1>: <func_arg1_value>     # key value pairs that are passed as parameters
+		<func_arg2>: <func_arg2_value>
+		# ...
+
+An example that splits the variable ``a`` and puts the generated list in ``b`` is the following:
+
+.. code-block:: yaml
+
+    Experiment:
+	type: Dict
+	match: ".*"
+	transform:
+	  param_split:
+	    in: $a
+	    out: $b
+	    functions:
+	    - split:            # split is a function that is defined by default
+		marker: "|"     # its only parameter is the marker that is used to split the string
+	records:
+	  Report:
+	    tags: $b
+
+This splits the string in '$a' and stores the resulting list in '$b'. This is here used to add a
+list valued property to the Report Record.
+
+
+There are a number of transform functions that are defined by default (see
+``src/caoscrawler/default_transformers.yml``). You can define custom transform functions by adding
+them to the cfood definition (see :doc:`CFood Documentation<../cfood>`).