Compare revisions

Florian Spreckelsen · Florian Spreckelsen · Florian Spreckelsen · Florian Spreckelsen · Florian Spreckelsen · Florian Spreckelsen
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,13 +9,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Added ###

+### Changed ###
+
+### Deprecated ###
+
+### Removed ###
+
+### Fixed ###
+
+### Security ###
+
+### Documentation ###
+
+## [0.8.0] - 2024-08-23 ##
+
+### Added ###
+
 * Support for Python 3.12 and experimental support for 3.13
-* `spss_to_datamodel` script.
-* `SPSSConverter` class
+* CFood macros now accept complex objects as values, not just strings.
+* More options for the `CSVTableConverter`
+* New converters:
+  * `DatetimeElementConverter`
+  * `SPSSConverter`
+* New scripts:
+  * `spss_to_datamodel`
+  * `csv_to_datamodel`
+* New transformer functions:
+  * `date_parse`
+  * `datetime_parse`
+* New ``PropertiesFromDictConverter`` which allows to automatically
+  create property values from dictionary keys.

 ### Changed ###

-### Deprecated ###
+* CFood macros do not render everything into strings now.
+* Better internal handling of identifiable/reference resolving and merging of entities.  This also
+  includes more understandable output for users.
+* Better handling of missing imports, with nice messages for users.
+* No longer use configuration of advancedtools to set to and from email addresses

 ### Removed ###

@@ -24,11 +55,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed ###

 * [93](https://gitlab.com/linkahead/linkahead-crawler/-/issues/93) cfood.yaml does not allow umlaut in $expression
+* [96](https://gitlab.com/linkahead/linkahead-crawler/-/issues/96) Do not fail silently on transaction errors

 ### Security ###

 ### Documentation ###

+* General improvement of the documentaion, in many small places.
+* The API documentation should now also include documentation of the constructors.
+
 ## [0.7.1] - 2024-03-21 ##

 ### Fixed ###
@@ -170,6 +205,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - ``add_prefix`` and ``remove_prefix`` arguments for the command line interface
  and the ``crawler_main`` function for the adding/removal of path prefixes when
  creating file entities.
+- More strict checking of `identifiables.yaml`.
+- Better error messages when server does not conform to expected data model.

 ### Changed ###

@@ -218,7 +255,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Some StructureElements changed (see "How to upgrade" in the docs):
  - Dict, DictElement and DictDictElement were merged into DictElement.
  - DictTextElement and TextElement were merged into TextElement. The "match"
-    keyword is now invalid for TextElements.
+	keyword is now invalid for TextElements.
 - JSONFileConverter creates another level of StructureElements (see "How to upgrade" in the docs)
 - create_flat_list function now collects entities in a set and also adds the entities
  contained in the given list directly

--- a/CITATION.cff
+++ b/CITATION.cff
@@ -17,6 +17,6 @@ authors:
    given-names: Alexander
    orcid: https://orcid.org/0000-0003-4124-9649
 title: CaosDB - Crawler
-version: 0.7.1
+version: 0.8.0
 doi: 10.3390/data9020024
-date-released: 2023-03-21
\ No newline at end of file
+date-released: 2024-08-23
\ No newline at end of file
--- a/integrationtests/basic_example/test_basic.py
+++ b/integrationtests/basic_example/test_basic.py
@@ -32,7 +32,7 @@ import sys
 from argparse import RawTextHelpFormatter
 from pathlib import Path

-import caosdb as db
+import linkahead as db
 import pytest
 import yaml
 from caosadvancedtools.crawler import Crawler as OldCrawler
@@ -42,8 +42,8 @@ from caoscrawler.debug_tree import DebugTree
 from caoscrawler.identifiable import Identifiable
 from caoscrawler.identifiable_adapters import CaosDBIdentifiableAdapter
 from caoscrawler.scanner import scan_directory
-from caosdb import EmptyUniqueQueryError
-from caosdb.utils.register_tests import clear_database, set_test_key
+from linkahead import EmptyUniqueQueryError
+from linkahead.utils.register_tests import clear_database, set_test_key

 set_test_key("10b128cf8a1372f30aa3697466bb55e76974e0c16a599bb44ace88f19c8f61e2")


--- a/integrationtests/test_use_case_simple_presentation.py
+++ b/integrationtests/test_use_case_simple_presentation.py
@@ -27,12 +27,12 @@ import os
 import pytest
 from subprocess import run

-import caosdb as db
+import linkahead as db
 from caosadvancedtools.loadFiles import loadpath
-from caosdb.cached import cache_clear
+from linkahead.cached import cache_clear
 from caosadvancedtools.models import parser as parser
 from caoscrawler.crawl import crawler_main
-from caosdb.utils.register_tests import clear_database, set_test_key
+from linkahead.utils.register_tests import clear_database, set_test_key


 set_test_key("10b128cf8a1372f30aa3697466bb55e76974e0c16a599bb44ace88f19c8f61e2")

--- a/setup.cfg
+++ b/setup.cfg
 [metadata]
 name = caoscrawler
-version = 0.7.2
+version = 0.8.1
 author = Alexander Schlemmer
 author_email = alexander.schlemmer@ds.mpg.de
 description = A new crawler for caosdb

--- a/src/caoscrawler/cfood-schema.yml
+++ b/src/caoscrawler/cfood-schema.yml
 cfood:
  type: object
+  properties:
+    Converters:
+      description: Defintiion of custom converters
+      type: object
+      additionalProperties:
+        type: object
+        properties:
+          converter:
+            type: string
+          package:
+            type: string
+        required:
+          - converter
+          - package
+    macros:
+      description: Macro definitions
+      type: array
+    Transformers:
+      description: Variable transformer definition
+      type: object
+      additionalProperties:
+        type: object
+        properties:
+          function:
+            type: string
+          package:
+            type: string
+        required:
+          - package
+          - function
  additionalProperties:
    $ref:
      "#/$defs/converter"
  $defs:
+    parents:
+      description: Parents for this record are given here as a list of names.
+      type: array
+      items:
+        type: string
    converter:
      properties:
        type:
@@ -28,7 +63,9 @@ cfood:
          - Definitions
          - Dict
          - Date
+          - Datetime
          - JSONFile
+          - YAMLFile
          - CSVTableConverter
          - XLSXTableConverter
          - SPSSFile
@@ -36,6 +73,7 @@ cfood:
          - H5Dataset
          - H5Group
          - H5Ndarray
+          - PropertiesFromDictElement
          description: Type of this converter node.
        match:
          description: typically a regexp which is matched to a structure element name
@@ -46,15 +84,46 @@ cfood:
        match_value:
          description: a regexp that is matched to the value of a key-value pair
          type: string
-        records:
-          description: This field is used to define new records or to modify records which have been defined on a higher level.
+        record_from_dict:
+          description: Only relevant for PropertiesFromDictElement.  Specify the root record which is generated from the contained dictionary.
          type: object
+          required:
+            - variable_name
          properties:
-            parents:
-              description: Parents for this record are given here as a list of names.
+            variable_name:
+              description: |
+                Name of the record by which it can be accessed in the
+                cfood definiton. Can also be the name of an existing
+                record in which case that record will be updated by
+                the PropertiesFromDictConverter.
+              type: string
+            properties_blacklist:
+              description: List of keys to be ignored in the automatic treatment.  They will be ignored on all levels of the dictionary.
              type: array
              items:
                type: string
+            references:
+              description: List of keys that will be transformed into named reference properties.
+              type: object
+              additionalProperties:
+                type: object
+                properties:
+                  parents:
+                    $ref:
+                      "#/$defs/parents"
+            name:
+              description: Name of this record.  If none is given, variable_name is used.
+              type: string
+            parents:
+              $ref:
+                "#/$defs/parents"
+        records:
+          description: This field is used to define new records or to modify records which have been defined on a higher level.
+          type: object
+          properties:
+            parents:
+              $ref:
+                "#/$defs/parents"
            additionalProperties:
              oneOf:
              - type: object
@@ -76,3 +145,15 @@ cfood:
          additionalProperties:
            $ref:
              "#/$defs/converter"
+      if:
+        properties:
+          type:
+            const:
+              "PropertiesFromDictElement"
+      then:
+        required:
+          - type
+          - record_from_dict
+      else:
+        required:
+          - type
--- a/src/caoscrawler/converters.py
+++ b/src/caoscrawler/converters.py
@@ -432,6 +432,7 @@ class Converter(object, metaclass=ABCMeta):
            return
        for transformer_key, transformer in self.definition["transform"].items():
            in_value = replace_variables(transformer["in"], values)
+            out_value = in_value

            for tr_func_el in transformer["functions"]:
                if not isinstance(tr_func_el, dict):
@@ -817,6 +818,180 @@ class DictElementConverter(Converter):
        return match_name_and_value(self.definition, element.name, element.value)


+class PropertiesFromDictConverter(DictElementConverter):
+    """Extend the :py:class:`DictElementConverter` by a heuristic to set
+    property values from the dictionary keys.
+
+    """
+
+    def __init__(self, definition: dict, name: str, converter_registry: dict,
+                 referenced_record_callback: Optional[callable] = None):
+
+        super().__init__(definition, name, converter_registry)
+        self.referenced_record_callback = referenced_record_callback
+
+    def _recursively_create_records(self, subdict: dict, root_record: db.Record,
+                                    root_rec_name: str,
+                                    values: GeneralStore, records: RecordStore,
+                                    referenced_record_callback: callable,
+                                    keys_modified: list = []
+                                    ):
+        """Create a record form the given `subdict` and recursively create referenced records."""
+
+        blacklisted_keys = self.definition["record_from_dict"][
+            "properties_blacklist"] if "properties_blacklist" in self.definition["record_from_dict"] else []
+        special_references = self.definition["record_from_dict"]["references"] if "references" in self.definition["record_from_dict"] else [
+        ]
+
+        for key, value in subdict.items():
+
+            if key in blacklisted_keys:
+                # We ignore this in the automated property generation
+                continue
+            if isinstance(value, list):
+                if not any([isinstance(val, dict) for val in value]):
+                    # no dict in list, i.e., no references, so this is simple
+                    root_record.add_property(name=key, value=value)
+                else:
+                    if not all([isinstance(val, dict) for val in value]):
+                        # if this is not an error (most probably it is), this
+                        # needs to be handled manually for now.
+                        raise ValueError(
+                            f"{key} in {subdict} contains a mixed list of references and scalars.")
+                    ref_recs = []
+                    for ii, ref_dict in enumerate(value):
+                        ref_var_name = f"{root_rec_name}.{key}.{ii+1}"
+                        ref_rec, keys_modified = self._create_ref_rec(
+                            ref_var_name,
+                            key,
+                            ref_dict,
+                            special_references,
+                            records,
+                            values,
+                            keys_modified,
+                            referenced_record_callback
+                        )
+                        ref_recs.append(ref_rec)
+                    root_record.add_property(name=key, value=ref_recs)
+
+            elif isinstance(value, dict):
+                # Treat scalar reference
+                ref_var_name = f"{root_rec_name}.{key}"
+                ref_rec, keys_modified = self._create_ref_rec(
+                    ref_var_name,
+                    key,
+                    value,
+                    special_references,
+                    records,
+                    values,
+                    keys_modified,
+                    referenced_record_callback
+                )
+                root_record.add_property(key, ref_rec)
+            else:
+                # All that remains are scalar properties which may or
+                # may not be special attributes like name.
+                if key.lower() in SPECIAL_PROPERTIES:
+                    setattr(root_record, key.lower(), value)
+                else:
+                    root_record.add_property(name=key, value=value)
+            keys_modified.append((root_rec_name, key))
+
+        if referenced_record_callback:
+            root_record = referenced_record_callback(root_record, records, values)
+
+        return keys_modified
+
+    def _create_ref_rec(
+            self,
+            name: str,
+            key: str,
+            subdict: dict,
+            special_references: dict,
+            records: RecordStore,
+            values: GeneralStore,
+            keys_modified: list,
+            referenced_record_callback: callable
+    ):
+        """Create the referenced Record and forward the stores etc. to
+        ``_recursively_create_records``.
+
+        Parameters:
+        -----------
+        name : str
+            name of the referenced record to be created in RecordStore and Value Store.
+        key : str
+            name of the key this record's definition had in the original dict.
+        subdict : dict
+            subdict containing this record's definition from the original dict.
+        special_references : dict
+            special treatment of referenced records from the converter definition.
+        records : RecordStore
+            RecordStore for entering new Records
+        values : GeneralStore
+            ValueStore for entering new Records
+        keys_modified : list
+            List for keeping track of changes
+        referenced_record_callback : callable
+            Advanced treatment of referenced records as given in the
+            converter initialization.
+        """
+        ref_rec = db.Record()
+        if key in special_references:
+            for par in special_references[key]["parents"]:
+                ref_rec.add_parent(par)
+        else:
+            ref_rec.add_parent(key)
+        records[name] = ref_rec
+        values[name] = ref_rec
+        keys_modified = self._recursively_create_records(
+            subdict=subdict,
+            root_record=ref_rec,
+            root_rec_name=name,
+            values=values,
+            records=records,
+            referenced_record_callback=referenced_record_callback,
+            keys_modified=keys_modified
+        )
+        return ref_rec, keys_modified
+
+    def create_records(self, values: GeneralStore, records: RecordStore,
+                       element: StructureElement):
+
+        keys_modified = []
+
+        rfd = self.definition["record_from_dict"]
+        if rfd["variable_name"] not in records:
+            rec = db.Record()
+            if "name" in rfd:
+                rec.name = rfd["name"]
+            if "parents" in rfd:
+                for par in rfd["parents"]:
+                    rec.add_parent(par)
+            else:
+                rec.add_parent(rfd["variable_name"])
+            records[rfd["variable_name"]] = rec
+            values[rfd["variable_name"]] = rec
+
+        else:
+            rec = records[rfd["variable_name"]]
+
+        keys_modified = self._recursively_create_records(
+            subdict=element.value,
+            root_record=rec,
+            root_rec_name=rfd["variable_name"],
+            values=values,
+            records=records,
+            referenced_record_callback=self.referenced_record_callback,
+            keys_modified=keys_modified,
+        )
+
+        keys_modified.extend(super().create_records(
+            values=values, records=records, element=element))
+
+        return keys_modified
+
+
 class DictConverter(DictElementConverter):
    def __init__(self, *args, **kwargs):
        warnings.warn(DeprecationWarning(
@@ -1240,11 +1415,12 @@ class DateElementConverter(TextElementConverter):
    """allows to convert different text formats of dates to Python date objects.

    The text to be parsed must be contained in the "date" group. The format string can be supplied
-    under "dateformat" in the Converter definition. The library used is datetime so see its
+    under "date_format" in the Converter definition. The library used is datetime so see its
    documentation for information on how to create the format string.

    """

+    # TODO make `date` parameter name configurable
    def match(self, element: StructureElement):
        matches = super().match(element)
        if matches is not None and "date" in matches:
@@ -1253,3 +1429,24 @@ class DateElementConverter(TextElementConverter):
                self.definition["date_format"] if "date_format" in self.definition else "%Y-%m-%d"
            ).date()})
        return matches
+
+
+class DatetimeElementConverter(TextElementConverter):
+    """Convert text so that it is formatted in a way that LinkAhead can understand it.
+
+The text to be parsed must be in the ``val`` parameter. The format string can be supplied in the
+``datetime_format`` node. This class uses the ``datetime`` module, so ``datetime_format`` must
+follow this specificaton:
+https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
+
+    """
+
+    # TODO make `val` parameter name configurable
+    def match(self, element: StructureElement):
+        matches = super().match(element)
+        if matches is not None and "val" in matches:
+            fmt_default = "%Y-%m-%dT%H:%M:%S"
+            fmt = self.definition.get("datetime_format", fmt_default)
+            dt_str = datetime.datetime.strptime(matches["val"], fmt).strftime(fmt_default)
+            matches.update({"val": dt_str})
+        return matches
--- a/src/caoscrawler/crawl.py
+++ b/src/caoscrawler/crawl.py
@@ -55,6 +55,9 @@ from linkahead.apiutils import (compare_entities,
                                merge_entities)
 from linkahead.cached import cache_clear, cached_get_entity_by
 from linkahead.common.datatype import get_list_datatype, is_reference
+from linkahead.exceptions import (
+    TransactionError,
+)
 from linkahead.utils.escape import escape_squoted_text

 from .config import get_config_setting
@@ -746,9 +749,31 @@ one with the entities that need to be updated and the other with entities to be
    def inform_about_pending_changes(pending_changes, run_id, path, inserts=False):
        # Sending an Email with a link to a form to authorize updates is
        if get_config_setting("send_crawler_notifications"):
-            filename = OldCrawler.save_form(
-                [el[3] for el in pending_changes], path, run_id)
-            OldCrawler.send_mail([el[3] for el in pending_changes], filename)
+            filename = OldCrawler.save_form([el[3] for el in pending_changes], path, run_id)
+            text = """Dear Curator,
+    there where changes that need your authorization. Please check the following
+    carefully and if the changes are ok, click on the following link:
+
+    {url}/Shared/{filename}
+
+    {changes}
+            """.format(url=db.configuration.get_config()["Connection"]["url"],
+                       filename=filename,
+                       changes="\n".join([el[3] for el in pending_changes]))
+            try:
+                fro = get_config_setting("sendmail_from_address")
+                to = get_config_setting("sendmail_to_address")
+            except KeyError:
+                logger.error("Server Configuration is missing a setting for "
+                             "sending mails. The administrator should check "
+                             "'from_mail' and 'to_mail'.")
+                return
+
+            send_mail(
+                from_addr=fro,
+                to=to,
+                subject="Crawler Update",
+                body=text)

        for i, el in enumerate(pending_changes):

@@ -859,6 +884,7 @@ def _notify_about_inserts_and_updates(n_inserts, n_updates, logfile, run_id):
    The email contains some basic information and a link to the log and the CrawlerRun Record.
    """
    if not get_config_setting("send_crawler_notifications"):
+        logger.debug("Crawler email notifications are disabled.")
        return
    if n_inserts == 0 and n_updates == 0:
        return
@@ -869,8 +895,8 @@ the CaosDB Crawler successfully crawled the data and

 """

+    domain = get_config_setting("public_host_url")
    if get_config_setting("create_crawler_status_records"):
-        domain = get_config_setting("public_host_url")
        text += ("You can checkout the CrawlerRun Record for more information:\n"
                 f"{domain}/Entity/?P=0L10&query=find%20crawlerrun%20with%20run_id=%27{run_id}%27\n\n")
    text += (f"You can download the logfile here:\n{domain}/Shared/" + logfile)
@@ -1056,6 +1082,10 @@ def crawler_main(crawled_directory_path: str,
            ident = CaosDBIdentifiableAdapter()
            ident.load_from_yaml_definition(identifiables_definition_file)
            crawler.identifiableAdapter = ident
+        else:
+            # TODO
+            # raise ValueError("An identifiable file is needed.")
+            pass

        remove_prefix = _treat_deprecated_prefix(prefix, remove_prefix)

@@ -1081,15 +1111,24 @@ def crawler_main(crawled_directory_path: str,
        logger.error(err)
        _update_status_record(crawler.run_id, 0, 0, status="FAILED")
        return 1
+    except TransactionError as err:
+        logger.debug(traceback.format_exc())
+        logger.error(err)
+        logger.error("Transaction error details:")
+        for suberr in err.errors:
+            logger.error("---")
+            logger.error(suberr.msg)
+            logger.error(suberr.entity)
+        return 1
    except Exception as err:
        logger.debug(traceback.format_exc())
-        logger.debug(err)
+        logger.error(err)

        if "SHARED_DIR" in os.environ:
            # pylint: disable=E0601
            domain = get_config_setting("public_host_url")
-            logger.error("Unexpected Error: Please tell your administrator about this and provide the"
-                         f" following path.\n{domain}/Shared/" + debuglog_public)
+            logger.error("Unexpected Error: Please tell your administrator about this and provide "
+                         f"the following path.\n{domain}/Shared/" + debuglog_public)
        _update_status_record(crawler.run_id, 0, 0, status="FAILED")
        return 1


--- a/src/caoscrawler/default_converters.yml
+++ b/src/caoscrawler/default_converters.yml
@@ -8,9 +8,15 @@ BooleanElement:
 Date:
  converter: DateElementConverter
  package: caoscrawler.converters
+Datetime:
+  converter: DatetimeElementConverter
+  package: caoscrawler.converters
 Dict:
  converter: DictElementConverter
  package: caoscrawler.converters
+PropertiesFromDictElement:
+  converter: PropertiesFromDictConverter
+  package: caoscrawler.converters
 FloatElement:
  converter: FloatElementConverter
  package: caoscrawler.converters

--- a/src/caoscrawler/default_transformers.yml
+++ b/src/caoscrawler/default_transformers.yml
-
+# Lookup table for matching functions and cfood yaml node names.

 submatch:
  package: caoscrawler.transformer_functions
@@ -9,3 +9,9 @@ split:
 replace:
  package: caoscrawler.transformer_functions
  function: replace
+date_parse:
+  package: caoscrawler.transformer_functions
+  function: date_parse
+datetime_parse:
+  package: caoscrawler.transformer_functions
+  function: datetime_parse
--- a/src/caoscrawler/exceptions.py
+++ b/src/caoscrawler/exceptions.py
@@ -27,15 +27,6 @@ class ForbiddenTransaction(Exception):
    pass


-class MissingReferencingEntityError(Exception):
-    """Thrown if the identifiable requires that some entity references the given entity but there
-    is no such reference """
-
-    def __init__(self, *args, rts=None, **kwargs):
-        self.rts = rts
-        super().__init__(self, *args, **kwargs)
-
-
 class ImpossibleMergeError(Exception):
    """Thrown if due to identifying information, two SyncNodes  or two Properties of SyncNodes
    should be merged, but there is conflicting information that prevents this.
@@ -47,8 +38,29 @@ class ImpossibleMergeError(Exception):
        super().__init__(self, *args, **kwargs)


+class InvalidIdentifiableYAML(Exception):
+    """Thrown if the identifiable definition is invalid."""
+    pass
+
+
 class MissingIdentifyingProperty(Exception):
    """Thrown if a SyncNode does not have the properties required by the corresponding registered
    identifiable
    """
    pass
+
+
+class MissingRecordType(Exception):
+    """Thrown if an record type can not be found although it is expected that it exists on the
+    server.
+    """
+    pass
+
+
+class MissingReferencingEntityError(Exception):
+    """Thrown if the identifiable requires that some entity references the given entity but there
+    is no such reference """
+
+    def __init__(self, *args, rts=None, **kwargs):
+        self.rts = rts
+        super().__init__(self, *args, **kwargs)
--- a/src/caoscrawler/identifiable_adapters.py
+++ b/src/caoscrawler/identifiable_adapters.py
@@ -36,7 +36,12 @@ import yaml
 from linkahead.cached import cached_get_entity_by, cached_query
 from linkahead.utils.escape import escape_squoted_text

-from .exceptions import MissingIdentifyingProperty, MissingReferencingEntityError
+from .exceptions import (
+    InvalidIdentifiableYAML,
+    MissingIdentifyingProperty,
+    MissingRecordType,
+    MissingReferencingEntityError,
+)
 from .identifiable import Identifiable
 from .sync_node import SyncNode
 from .utils import has_parent
@@ -48,7 +53,10 @@ def get_children_of_rt(rtname):
    """Supply the name of a recordtype. This name and the name of all children RTs are returned in
    a list"""
    escaped = escape_squoted_text(rtname)
-    return [p.name for p in cached_query(f"FIND RECORDTYPE '{escaped}'")]
+    recordtypes = [p.name for p in cached_query(f"FIND RECORDTYPE '{escaped}'")]
+    if not recordtypes:
+        raise MissingRecordType(f"Record type could not be found on server: {rtname}")
+    return recordtypes


 def convert_value(value: Any) -> str:
@@ -165,7 +173,10 @@ class IdentifiableAdapter(metaclass=ABCMeta):
        """
        if node.registered_identifiable is None:
            if raise_exception:
-                raise RuntimeError("no registered_identifiable")
+                parents = [p.name for p in node.parents]
+                parents_str = "\n".join(f"- {p}" for p in parents)
+                raise RuntimeError("No registered identifiable for node with these parents:\n"
+                                   + parents_str)
            else:
                return False
        for prop in node.registered_identifiable.properties:
@@ -576,19 +587,32 @@ class CaosDBIdentifiableAdapter(IdentifiableAdapter):
        """Load identifiables defined in a yaml file"""
        with open(path, "r", encoding="utf-8") as yaml_f:
            identifiable_data = yaml.safe_load(yaml_f)
+        self.load_from_yaml_object(identifiable_data)

-        for key, value in identifiable_data.items():
-            rt = db.RecordType().add_parent(key)
-            for prop_name in value:
+    def load_from_yaml_object(self, identifiable_data):
+        """Load identifiables defined in a yaml object.
+        """
+
+        for rt_name, id_list in identifiable_data.items():
+            rt = db.RecordType().add_parent(rt_name)
+            if not isinstance(id_list, list):
+                raise InvalidIdentifiableYAML(
+                    f"Identifiable contents must be lists, but this was not: {rt_name}")
+            for prop_name in id_list:
                if isinstance(prop_name, str):
                    rt.add_property(name=prop_name)
                elif isinstance(prop_name, dict):
                    for k, v in prop_name.items():
+                        if k == "is_referenced_by" and not isinstance(v, list):
+                            raise InvalidIdentifiableYAML(
+                                f"'is_referenced_by' must be a list.  Found in: {rt_name}")
                        rt.add_property(name=k, value=v)
                else:
-                    NotImplementedError("YAML is not structured correctly")
+                    raise InvalidIdentifiableYAML(
+                        "Identifiable properties must be str or dict, but this one was not:\n"
+                        f"    {rt_name}/{prop_name}")

-            self.register_identifiable(key, rt)
+            self.register_identifiable(rt_name, rt)

    def register_identifiable(self, name: str, definition: db.RecordType):
        self._registered_identifiables[name] = definition

--- a/src/caoscrawler/macros/macro_yaml_object.py
+++ b/src/caoscrawler/macros/macro_yaml_object.py
@@ -25,12 +25,17 @@
 # Function to expand a macro in yaml
 # A. Schlemmer, 05/2022

+import re
 from dataclasses import dataclass
 from typing import Any, Dict
 from copy import deepcopy
 from string import Template


+_SAFE_SUBST_PAT = re.compile(r"^\$(?P<key>\w+)$")
+_SAFE_SUBST_PAT_BRACES = re.compile(r"^\$\{(?P<key>\w+)}$")
+
+
 @dataclass
 class MacroDefinition:
    """
@@ -53,6 +58,12 @@ def substitute(propvalue, values: dict):
    Substitution of variables in strings using the variable substitution
    library from python's standard library.
    """
+    # Simple matches are simply replaced by the raw dict entry.
+    if match := (_SAFE_SUBST_PAT.fullmatch(propvalue)
+                 or _SAFE_SUBST_PAT_BRACES.fullmatch(propvalue)):
+        key = match.group("key")
+        if key in values:
+            return values[key]
    propvalue_template = Template(propvalue)
    return propvalue_template.safe_substitute(**values)


--- a/src/caoscrawler/scripts/generators.py
+++ b/src/caoscrawler/scripts/generators.py
@@ -104,17 +104,27 @@ metadata:
 directory: # corresponds to the directory given to the crawler
  type: Directory
  match: .* # we do not care how it is named here
+  records:
+    DirRecord:    # One record for each directory.
  subtree:
    # This is the file
    thisfile:
      type: []{file}
      match: []{match}
+      records:
+        DatFileRecord:    # One record for each matching file
+          role: File
+          path: $thisfile
+          file: $thisfile
      subtree:
        entry:
          type: Dict
          match: .* # Name is irrelevant
          records:
-            MyParent:
+            BaseElement:    # One BaseElement record for each row in the CSV/TSV file
+              DatFileRecord: $DatFileRecord
+            DirRecord:
+              BaseElement: +$BaseElement
          subtree: !macro
 """

@@ -196,8 +206,24 @@ cfood: str
            defs.append(def_str)
            del defs

+        sep = repr(sniffed.delimiter)
+        sep = f'"{sep[1:-1]}"'
+        match_str = f"""'.*[ct]sv'
+      sep: {sep}
+      # "header": [int]
+      # "names": [str]
+      # "index_col": [int]
+      # "usecols": [int]
+      # "true_values": [str]
+      # "false_values": [str]
+      # "na_values": [str]
+      # "skiprows": [int]
+      # "nrows": [int]
+      # "keep_default_na": [bool]
+        """
+
        cfood_str = (_CustomTemplate(CFOOD_TEMPLATE).substitute({"file": "CSVTableConverter",
-                                                                 "match": ".*\\[ct]sv"})
+                                                                 "match": match_str})
                     + prefix[2:] + "ColumnValue:\n" + "".join(defs_col_value)
                     + prefix[2:] + "ColumnValueReference:\n" + "".join(defs_col_value_ref)
                     )

--- a/src/caoscrawler/transformer_functions.py
+++ b/src/caoscrawler/transformer_functions.py
@@ -20,9 +20,14 @@
 # You should have received a copy of the GNU Affero General Public License
 # along with this program. If not, see <https://www.gnu.org/licenses/>.

+"""Definition of default transformer functions.
+
+See https://docs.indiscale.com/caosdb-crawler/converters.html#transform-functions for more
+information.
+
 """
-Defnition of default transformer functions.
-"""
+
+import datetime
 import re
 from typing import Any

@@ -61,3 +66,36 @@ def replace(in_value: Any, in_parameters: dict):
    if not isinstance(in_value, str):
        raise RuntimeError("must be string")
    return in_value.replace(in_parameters['remove'], in_parameters['insert'])
+
+
+def date_parse(in_value: str, params: dict) -> str:
+    """Transform text so that it is formatted in a way that LinkAhead can understand it.
+
+Parameters
+==========
+
+- date_format: str, optional
+    A format string using the ``datetime`` specificaton:
+    https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
+    """
+    fmt_default = "%Y-%m-%d"
+    fmt = params.get("date_format", fmt_default)
+    dt_str = datetime.datetime.strptime(in_value, fmt).strftime(fmt_default)
+    return dt_str
+
+
+def datetime_parse(in_value: str, params: dict) -> str:
+    """Transform text so that it is formatted in a way that LinkAhead can understand it.
+
+
+Parameters
+==========
+
+- datetime_format: str, optional
+    A format string using the ``datetime`` specificaton:
+    https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
+    """
+    fmt_default = "%Y-%m-%dT%H:%M:%S"
+    fmt = params.get("datetime_format", fmt_default)
+    dt_str = datetime.datetime.strptime(in_value, fmt).strftime(fmt_default)
+    return dt_str
--- a/src/doc/README_SETUP.md
+++ b/src/doc/README_SETUP.md
@@ -13,7 +13,10 @@ see INSTALL.md
 We use sphinx to create the documentation. Docstrings in the code should comply
 with the Googly style (see link below).

-Build documentation in `src/doc` with `make html`.
+Build documentation in `src/doc` with `make doc`. Note that for the
+automatic generation of the complete API documentation, it is
+necessary to first install this library with all its optional
+dependencies, i.e., `pip install .[h5-crawler,spss]`.

 ### Requirements ###


--- a/src/doc/conf.py
+++ b/src/doc/conf.py
@@ -33,10 +33,10 @@ copyright = '2024, IndiScale'
 author = 'Alexander Schlemmer'

 # The short X.Y version
-version = '0.7.2'
+version = '0.8.1'
 # The full version, including alpha/beta/rc tags
 # release = '0.5.2-rc2'
-release = '0.7.2-dev'
+release = '0.8.1-dev'


 # -- General configuration ---------------------------------------------------

--- a/src/doc/converters.rst
+++ b/src/doc/converters.rst
@@ -31,20 +31,20 @@ The yaml definition may look like this:
 .. code-block:: yaml

    <NodeName>:
-        type: <ConverterName>
-        match: ".*"
-        records:
-            Experiment1:
-                parents:
-                - Experiment
-                - Blablabla
-                date: $DATUM
-                (...)
-            Experiment2:
-                parents:
-                - Experiment
-        subtree:
-            (...)
+	type: <ConverterName>
+	match: ".*"
+	records:
+	    Experiment1:
+		parents:
+		- Experiment
+		- Blablabla
+		date: $DATUM
+		(...)
+	    Experiment2:
+		parents:
+		- Experiment
+	subtree:
+	    (...)

 The **<NodeName>** is a description of what the current block represents (e.g.
 ``experiment-folder``) and is used as an identifier.
@@ -76,35 +76,35 @@ applied to the respective variables when the converter is executed.
 .. code-block:: yaml

    <NodeName>:
-        type: <ConverterName>
-        match: ".*"
-        transform:
-          <TransformNodeName>:
-            in: $<in_var_name>
-            out: $<out_var_name>
-            functions:
-            - <func_name>:                         # name of the function to be applied
-                <func_arg1>: <func_arg1_value>     # key value pairs that are passed as parameters
-                <func_arg2>: <func_arg2_value>
-                # ...
+	type: <ConverterName>
+	match: ".*"
+	transform:
+	  <TransformNodeName>:
+	    in: $<in_var_name>
+	    out: $<out_var_name>
+	    functions:
+	    - <func_name>:                         # name of the function to be applied
+		<func_arg1>: <func_arg1_value>     # key value pairs that are passed as parameters
+		<func_arg2>: <func_arg2_value>
+		# ...

 An example that splits the variable ``a`` and puts the generated list in ``b`` is the following:

 .. code-block:: yaml

    Experiment:
-        type: Dict
-        match: ".*"
-        transform:
-          param_split:
-            in: $a
-            out: $b
-            functions:
-            - split:            # split is a function that is defined by default
-                marker: "|"     # its only parameter is the marker that is used to split the string
-        records:
-          Report:
-            tags: $b
+	type: Dict
+	match: ".*"
+	transform:
+	  param_split:
+	    in: $a
+	    out: $b
+	    functions:
+	    - split:            # split is a function that is defined by default
+		marker: "|"     # its only parameter is the marker that is used to split the string
+	records:
+	  Report:
+	    tags: $b

 This splits the string in '$a' and stores the resulting list in '$b'. This is here used to add a
 list valued property to the Report Record.
@@ -218,21 +218,21 @@ Example:
       type: CSVTableConverter
       match: ^test_table.csv$
       records:
-         (...)  # Records edited for the whole table file
+	 (...)  # Records edited for the whole table file
       subtree:
-         ROW:  # Any name for a data row in the table
-           type: DictElement
-           match_name: .*
-           match_value: .*
-           records:
-             (...)  # Records edited for each row
-           subtree:
-             COLUMN:  # Any name for a specific type of column in the table
-               type: FloatElement
-               match_name: measurement  # Name of the column in the table file
-               match_value: (?P<column_value).*)
-               records:
-                 (...)  # Records edited for each cell
+	 ROW:  # Any name for a data row in the table
+	   type: DictElement
+	   match_name: .*
+	   match_value: .*
+	   records:
+	     (...)  # Records edited for each row
+	   subtree:
+	     COLUMN:  # Any name for a specific type of column in the table
+	       type: FloatElement
+	       match_name: measurement  # Name of the column in the table file
+	       match_value: (?P<column_value).*)
+	       records:
+		 (...)  # Records edited for each cell


 XLSXTableConverter
@@ -245,6 +245,140 @@ CSVTableConverter

 CSV File → DictElement

+PropertiesFromDictConverter
+===========================
+
+The :py:class:`~caoscrawler.converters.PropertiesFromDictConverter` is
+a specialization of the
+:py:class:`~caoscrawler.converters.DictElementConverter` and offers
+all its functionality. It is meant to operate on dictionaries (e.g.,
+from reading in a json or a table file), the keys of which correspond
+closely to properties in a LinkAhead datamodel. This is especially
+handy in cases where properties may be added to the data model and
+data sources that are not yet known when writing the cfood definition.
+
+The converter definition of the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has an
+additional required entry ``record_from_dict`` which specifies the
+Record to which the properties extracted from the dict are attached
+to. This Record is identified by its ``variable_name`` by which it can
+be referred to further down the subtree. You can also use the name of
+a Record that was specified earlier in the CFood definition in order
+to extend it by the properties extracted from a dict. Let's have a
+look at a simple example. A CFood definition
+
+.. code-block:: yaml
+
+   PropertiesFromDictElement:
+       type: PropertiesFromDictElement
+       match: ".*"
+       record_from_dict:
+	   variable_name: MyRec
+	   parents:
+	   - MyType1
+	   - MyType2
+
+applied to a dictionary
+
+.. code-block:: json
+
+   {
+     "name": "New name",
+     "a": 5,
+     "b": ["a", "b", "c"],
+     "author": {
+       "full_name": "Silvia Scientist"
+     }
+   }
+
+will create a Record ``New name`` with parents ``MyType1`` and
+``MyType2``. It has a scalar property ``a`` with value 5, a list
+property ``b`` with values "a", "b" and "c", and an ``author``
+property which references an ``author`` with a ``full_name`` property
+with value "Silvia Scientist":
+
+.. image:: img/properties-from-dict-records-author.png
+  :height: 210
+
+Note how the different dictionary keys are handled differently
+depending on their types: scalar and list values are understood
+automatically, and a dictionary-valued entry like ``author`` is
+translated into a reference to an ``author`` Record automatically.
+
+You can further specify how references are treated with an optional
+``references key`` in ``record_from_dict``. Let's assume that in the
+above example, we have an ``author`` **Property** with datatype
+``Person`` in our data model. We could add this information by
+extending the above example definition by
+
+
+.. code-block:: yaml
+
+   PropertiesFromDictElement:
+       type: PropertiesFromDictElement
+       match: ".*"
+       record_from_dict:
+	   variable_name: MyRec
+	   parents:
+	   - MyType1
+	   - MyType2
+	   references:
+	       author:
+		   parents:
+		   - Person
+
+so that now, a ``Person`` record with a ``full_name`` property with
+value "Silvia Scientist" is created as the value of the ``author``
+property:
+
+.. image:: img/properties-from-dict-records-person.png
+  :height: 200
+
+For the time being, only the parents of the referenced record can be
+set via this option. More complicated treatments can be implemented
+via the ``referenced_record_callback`` (see below).
+
+Properties can be blacklisted with the ``properties_blacklist``
+keyword, i.e., all keys listed under ``properties_blacklist`` will be
+excluded from automated treatment. Since the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` has
+all the functionality of the
+:py:class:`~caoscrawler.converters.DictElementConverter`, individual
+properties can still be used in a subtree. Together with
+``properties_blacklist`` this can be used to add custom treatment to
+specific properties by blacklisting them in ``record_from_dict`` and
+then treating them in the subtree the same as you would do it in the
+standard
+:py:class:`~caoscrawler.converters.DictElementConverter`. Note that
+the blacklisted keys are excluded on **all** levels of the dictionary,
+i.e., also when they occur in a referenced entity.
+
+For further customization, the
+:py:class:`~caoscrawler.converters.PropertiesFromDictConverter` can be
+used as a basis for :ref:`custom converters<Custom Converters>` which
+can make use of its ``referenced_record_callback`` argument. The
+``referenced_record_callback`` can be a callable object which takes
+exactly a Record as an argument and needs to return that Record after
+doing whatever custom treatment is needed. Additionally, it is given
+the ``RecordStore`` and the ``ValueStore`` in order to be able to
+access the records and values that have already been defined from
+within ``referenced_record_callback``. Such a function might look the
+following:
+
+.. code-block:: python
+
+   def my_callback(rec: db.Record, records: RecordStore, values: GeneralStore):
+       # do something with rec, possibly using other records or values from the stores...
+       rec.description = "This was updated in a callback"
+       return rec
+
+It is applied to all Records that are created from the dictionary and
+it can be used to, e.g., transform values of some properties, or add
+special treatment to all Records of a specific
+type. ``referenced_record_callback`` is applied **after** the
+properties from the dictionary have been applied as explained above.
+
+
 Further converters
 ++++++++++++++++++

@@ -293,7 +427,7 @@ datamodel like
   H5Ndarray:
     obligatory_properties:
       internal_hdf5-path:
-         datatype: TEXT
+	 datatype: TEXT

 although the names of both property and record type can be configured within the
 cfood definition.
@@ -407,11 +541,11 @@ First we will create our package and module structure, which might be:
     tox.ini
     src/
       scifolder/
-         __init__.py
-         converters/
-           __init__.py
-           sources.py  # <- the actual file containing
-                       #    the converter class
+	 __init__.py
+	 converters/
+	   __init__.py
+	   sources.py  # <- the actual file containing
+		       #    the converter class
     doc/
     unittests/

@@ -436,74 +570,74 @@ that would be given using a yaml definition (see next section below).
      """

      def __init__(self, definition: dict, name: str,
-                   converter_registry: dict):
-          """
-          Initialize a new directory converter.
-          """
-          super().__init__(definition, name, converter_registry)
+		   converter_registry: dict):
+	  """
+	  Initialize a new directory converter.
+	  """
+	  super().__init__(definition, name, converter_registry)

      def create_children(self, generalStore: GeneralStore,
-                                element: StructureElement):
+				element: StructureElement):

-          # The source resolver does not create children:
+	  # The source resolver does not create children:

-          return []
+	  return []

      def create_records(self, values: GeneralStore,
-                         records: RecordStore,
-                         element: StructureElement,
-                         file_path_prefix):
-          if not isinstance(element, TextElement):
-              raise RuntimeError()
-
-          # This function must return a list containing tuples, each one for a modified
-          # property: (name_of_entity, name_of_property)
-          keys_modified = []
-
-          # This is the name of the entity where the source is going to be attached:
-          attach_to_scientific_activity = self.definition["scientific_activity"]
-          rec = records[attach_to_scientific_activity]
-
-          # The "source" is a path to a source project, so it should have the form:
-          # /<Category>/<project>/<scientific_activity>/
-          # obtain these information from the structure element:
-          val = element.value
-          regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
-                    '/(?P<project_date>.*?)_(?P<project_identifier>.*)'
-                    '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
-
-          res = re.match(regexp, val)
-          if res is None:
-              raise RuntimeError("Source cannot be parsed correctly.")
-
-          # Mapping of categories on the file system to corresponding record types in CaosDB:
-          cat_map = {
-              "SimulationData": "Simulation",
-              "ExperimentalData": "Experiment",
-              "DataAnalysis": "DataAnalysis"}
-          linkrt = cat_map[res.group("category")]
-
-          keys_modified.extend(create_records(values, records, {
-              "Project": {
-                  "date": res.group("project_date"),
-                  "identifier": res.group("project_identifier"),
-              },
-              linkrt: {
-                  "date": res.group("date"),
-                  "identifier": res.group("identifier"),
-                  "project": "$Project"
-              },
-              attach_to_scientific_activity: {
-                  "sources": "+$" + linkrt
-              }}, file_path_prefix))
-
-          # Process the records section of the yaml definition:
-          keys_modified.extend(
-              super().create_records(values, records, element, file_path_prefix))
-
-          # The create_records function must return the modified keys to make it compatible
-          # to the crawler functions:
-          return keys_modified
+			 records: RecordStore,
+			 element: StructureElement,
+			 file_path_prefix):
+	  if not isinstance(element, TextElement):
+	      raise RuntimeError()
+
+	  # This function must return a list containing tuples, each one for a modified
+	  # property: (name_of_entity, name_of_property)
+	  keys_modified = []
+
+	  # This is the name of the entity where the source is going to be attached:
+	  attach_to_scientific_activity = self.definition["scientific_activity"]
+	  rec = records[attach_to_scientific_activity]
+
+	  # The "source" is a path to a source project, so it should have the form:
+	  # /<Category>/<project>/<scientific_activity>/
+	  # obtain these information from the structure element:
+	  val = element.value
+	  regexp = (r'/(?P<category>(SimulationData)|(ExperimentalData)|(DataAnalysis))'
+		    '/(?P<project_date>.*?)_(?P<project_identifier>.*)'
+		    '/(?P<date>[0-9]{4,4}-[0-9]{2,2}-[0-9]{2,2})(_(?P<identifier>.*))?/')
+
+	  res = re.match(regexp, val)
+	  if res is None:
+	      raise RuntimeError("Source cannot be parsed correctly.")
+
+	  # Mapping of categories on the file system to corresponding record types in CaosDB:
+	  cat_map = {
+	      "SimulationData": "Simulation",
+	      "ExperimentalData": "Experiment",
+	      "DataAnalysis": "DataAnalysis"}
+	  linkrt = cat_map[res.group("category")]
+
+	  keys_modified.extend(create_records(values, records, {
+	      "Project": {
+		  "date": res.group("project_date"),
+		  "identifier": res.group("project_identifier"),
+	      },
+	      linkrt: {
+		  "date": res.group("date"),
+		  "identifier": res.group("identifier"),
+		  "project": "$Project"
+	      },
+	      attach_to_scientific_activity: {
+		  "sources": "+$" + linkrt
+	      }}, file_path_prefix))
+
+	  # Process the records section of the yaml definition:
+	  keys_modified.extend(
+	      super().create_records(values, records, element, file_path_prefix))
+
+	  # The create_records function must return the modified keys to make it compatible
+	  # to the crawler functions:
+	  return keys_modified


 If the recommended (python) package structure is used, the package containing the converter
@@ -530,8 +664,8 @@ function signature:
 .. code-block:: python

    def create_records(values: GeneralStore,  # <- pass the current variables store here
-                       records: RecordStore,  # <- pass the current store of CaosDB records here
-                       def_records: dict):    # <- This is the actual definition of new records!
+		       records: RecordStore,  # <- pass the current store of CaosDB records here
+		       def_records: dict):    # <- This is the actual definition of new records!


 `def_records` is the actual definition of new records according to the yaml cfood specification
@@ -547,7 +681,7 @@ Let's have a look at a few examples:
    match: (?P<dir_name>.*)
    records:
      Experiment:
-        identifier: $dir_name
+	identifier: $dir_name

 This block will just create a new record with parent `Experiment` and one property
 `identifier` with a value derived from the matching regular expression.
@@ -565,7 +699,7 @@ Let's formulate that using `create_records`:
  }

  keys_modified = create_records(values, records,
-                                 record_def)
+				 record_def)

 The `dir_name` is set explicitely here, everything else is identical to the yaml statements.

@@ -588,9 +722,9 @@ So, a sketch of a typical implementation within a custom converter could look li
 .. code-block:: python

  def create_records(self, values: GeneralStore,
-                       records: RecordStore,
-                       element: StructureElement,
-                       file_path_prefix: str):
+		       records: RecordStore,
+		       element: StructureElement,
+		       file_path_prefix: str):

    # Modify some records:
    record_def = {
@@ -598,15 +732,15 @@ So, a sketch of a typical implementation within a custom converter could look li
    }

  keys_modified = create_records(values, records,
-                                 record_def)
+				 record_def)

  # You can of course do it multiple times:
  keys_modified.extend(create_records(values, records,
-                                      record_def))
+				      record_def))

  # You can also process the records section of the yaml definition:
  keys_modified.extend(
-         super().create_records(values, records, element, file_path_prefix))
+	 super().create_records(values, records, element, file_path_prefix))
  # This essentially allows users of your converter to customize the creation of records
  # by providing a custom "records" section additionally to the modifications provided
  # in this implementation of the Converter.
@@ -627,12 +761,12 @@ Let's have a look at a more complex examples, defining multiple records:
    match: (?P<dir_name>.*)
    records:
      Project:
-        identifier: project_name
+	identifier: project_name
      Experiment:
-        identifier: $dir_name
-        Project: $Project
+	identifier: $dir_name
+	Project: $Project
      ProjectGroup:
-        projects: +$Project
+	projects: +$Project


 This block will create two new Records:
@@ -665,7 +799,7 @@ Let's formulate that using `create_records` (again, `dir_name` is constant here)
  }

  keys_modified = create_records(values, records,
-                                 record_def)
+				 record_def)

 Debugging
 =========
@@ -681,7 +815,7 @@ output for the match step. The following snippet illustrates this:
    debug_match: True
    records:
      Project:
-        identifier: project_name
+	identifier: project_name


 Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against

--- a/src/doc/getting_started/helloworld.md
+++ b/src/doc/getting_started/helloworld.md
@@ -33,7 +33,7 @@ Then you can do the following interactively in (I)Python. But we recommend that
 copy the code into a script and execute it to spare yourself typing.

 ```python
-import caosdb as db
+import linkahead as db
 from datetime import datetime
 from caoscrawler import Crawler, SecurityMode
 from caoscrawler.identifiable_adapters import CaosDBIdentifiableAdapter

--- a/src/doc/getting_started/optionalfeatures.rst
+++ b/src/doc/getting_started/optionalfeatures.rst
@@ -30,6 +30,13 @@ to decide what tool is used for sending mails (use the upper one if you
 want to actually send mails. See ``sendmail`` configuration in the
 LinkAhead docs.

+You can even supply the name of a custom CSS file that shall be used:
+
+.. code:: ini
+
+   [advancedtools]
+   crawler.customcssfile = theme-research.css
+
 Crawler Status Records
 ----------------------
No results found