Merge branch 'dev' into f-json-specification-doc

41a8de9d · Henrik tom Wörden · 3ca144bd · 0bdf4ed0 · 41a8de9d · 41a8de9d
Commit 41a8de9d authored Feb 4, 2023 by Henrik tom Wörden
--- a/.gitignore
+++ b/.gitignore
@@ -17,3 +17,4 @@ src/doc/_apidoc/
 start_caosdb_docker.sh
 src/doc/_apidoc
 /dist/
+*.egg-info
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -120,10 +120,10 @@ unittest_py3.9:
  script:
      - tox
-unittest_py3.8:
+unittest_py3.7:
  tags: [cached-dind]
  stage: test
-  image: python:3.8
+  image: python:3.7
  script: &python_test_script
    # install dependencies
    - pip install pytest pytest-cov
@@ -135,12 +135,24 @@ unittest_py3.8:
    - caosdb-crawler --help
    - pytest --cov=caosdb -vv ./unittests
+unittest_py3.8:
+  tags: [cached-dind]
+  stage: test
+  image: python:3.8
+  script: *python_test_script
 unittest_py3.10:
  tags: [cached-dind]
  stage: test
  image: python:3.10
  script: *python_test_script
+unittest_py3.11:
+  tags: [cached-dind]
+  stage: test
+  image: python:3.11
+  script: *python_test_script
 inttest:
  tags: [docker]
  services:
@@ -277,3 +289,27 @@ style:
  script:
      - autopep8 -r --diff --exit-code .
  allow_failure: true
+# Build the sphinx documentation and make it ready for deployment by Gitlab Pages
+# Special job for serving a static website. See https://docs.gitlab.com/ee/ci/yaml/README.html#pages
+# Based on: https://gitlab.indiscale.com/caosdb/src/caosdb-pylib/-/ci/editor?branch_name=main
+pages_prepare: &pages_prepare
+  tags: [ cached-dind ]
+  stage: deploy
+  needs: []
+  image: $CI_REGISTRY/caosdb/src/caosdb-pylib/testenv:latest
+  only:
+    refs:
+      - /^release-.*$/i
+  script:
+    - echo "Deploying documentation"
+    - make doc
+    - cp -r build/doc/html public
+  artifacts:
+    paths:
+      - public
+pages:
+  <<: *pages_prepare
+  only:
+    refs:
+      - main
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,12 +7,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased] ##
+### Added ###
+- DateElementConverter: allows to interpret text as a date object
+- the restricted_path argument allows to crawl only a subtree
+### Changed ###
+### Deprecated ###
+### Removed ###
+### Fixed ###
+- an empty string as name is treated as no name (as does the server). This, fixes
+  queries for identifiables since it would contain "WITH name=''" otherwise
+  which is an impossible condition. If your cfoods contained this case, they are ill defined.
+### Security ###
+### Documentation ###
+## [0.3.0] - 2022-01-30 ##
+(Florian Spreckelsen)
 ### Added ###
 - Identifiable class to represent the information used to identify Records.
 - Added some StructureElements: BooleanElement, FloatElement, IntegerElement, 
  ListElement, DictElement
 - String representation for Identifiables
+- [#43](https://gitlab.com/caosdb/caosdb-crawler/-/issues/43) the crawler
+  version can now be specified in the `metadata` section of the cfood
+  definition. It is checked against the installed version upon loading of the
+  definition.
+- JSON schema validation can also be used in the DictElementConverter
+- YAMLFileConverter class; to parse YAML files
+- Variables can now be substituted within the definition of yaml macros
+- debugging option for the match step of Converters
+- Re-introduced support for Python 3.7
 ### Changed ###
@@ -20,23 +51,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  - Dict, DictElement and DictDictElement were merged into DictElement.
  - DictTextElement and TextElement were merged into TextElement. The "match"
    keyword is now invalid for TextElements.
+- JSONFileConverter creates another level of StructureElements (see "How to upgrade" in the docs)
+- create_flat_list function now collects entities in a set and also adds the entities
+  contained in the given list directly
 ### Deprecated ###
 - The DictXYElements are now depricated and are now synonyms for the
  XYElements.
-### Removed ###
 ### Fixed ###
 - [#39](https://gitlab.com/caosdb/caosdb-crawler/-/issues/39) Merge conflicts in
  `split_into_inserts_and_updates` when cached entity references a record
  without id
+- Queries for identifiables with boolean properties are now created correctly.
-### Security ###
-### Documentation ###
 ## [0.2.0] - 2022-11-18 ##
 (Florian Spreckelsen)

--- a/CITATION.cff
+++ b/CITATION.cff
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+  - family-names: Fitschen
+    given-names: Timm
+    orcid: https://orcid.org/0000-0002-4022-432X
+  - family-names: Schlemmer
+    given-names: Alexander
+    orcid: https://orcid.org/0000-0003-4124-9649
+  - family-names: Hornung
+    given-names: Daniel
+    orcid: https://orcid.org/0000-0002-7846-6375
+  - family-names: tom Wörden
+    given-names: Henrik
+    orcid: https://orcid.org/0000-0002-5549-578X
+  - family-names: Parlitz
+    given-names: Ulrich
+    orcid: https://orcid.org/0000-0003-3058-1435
+  - family-names: Luther
+    given-names: Stefan
+    orcid: https://orcid.org/0000-0001-7214-8125
+title: CaosDB - Crawler
+version: 0.3.0
+doi: 10.3390/data4020083
+date-released: 2023-01-30
\ No newline at end of file
--- a/RELEASE_GUIDELINES.md
+++ b/RELEASE_GUIDELINES.md
@@ -24,6 +24,7 @@ guidelines of the CaosDB Project
   - `version` variables in `src/doc/conf.py`
   - Version in [setup.cfg](./setup.cfg): Check the `MAJOR`, `MINOR`, `MICRO`, `PRE` variables and set
     `ISRELEASED` to `True`. Use the possibility to issue pre-release versions for testing.
+   - `CITATION.cff` (update version and date)
 5. Merge the release branch into the main branch.

--- a/integrationtests/test_data/extroot/realworld_example/dataset_cfoods.yml
+++ b/integrationtests/test_data/extroot/realworld_example/dataset_cfoods.yml
@@ -31,6 +31,10 @@ Data:
          type: JSONFile
          match: .dataspace.json
          validate: schema/dataspace.schema.json
+          subtree:
+            jsondict:
+              type: DictElement
+              match: .*
              subtree:
                dataspace_id_element:
                  type: IntegerElement
@@ -150,6 +154,10 @@ Data:
                  type: JSONFile
                  match: metadata.json
                  validate: schema/dataset.schema.json
+                  subtree:
+                    jsondict:
+                      type: DictElement
+                      match: .*
                      subtree:
                        title_element:
                          type: TextElement

--- a/pytest.ini
+++ b/pytest.ini
-[pytest]
-testpaths=unittests
--- a/setup.cfg
+++ b/setup.cfg
 [metadata]
 name = caoscrawler
-version = 0.2.1
+version = 0.3.1
 author = Alexander Schlemmer
 author_email = alexander.schlemmer@ds.mpg.de
 description = A new crawler for caosdb
@@ -17,15 +17,16 @@ classifiers =
 package_dir =
            = src
 packages = find:
-python_requires = >=3.8
+python_requires = >=3.7
 install_requires =
 	importlib-resources
-	caosdb > 0.10.0
+	caosdb >= 0.11.0
 	caosadvancedtools >= 0.6.0
    yaml-header-tools >= 0.2.1
    pyyaml
    odfpy #make optional
    pandas
+    importlib_metadata;python_version<'3.8'
 [options.packages.find]
 where = src

--- a/src/caoscrawler/__init__.py
+++ b/src/caoscrawler/__init__.py
 from .crawl import Crawler, SecurityMode
+from .version import CfoodRequiredVersionError, version as __version__
--- a/src/caoscrawler/cfood-schema.yml
+++ b/src/caoscrawler/cfood-schema.yml
@@ -27,6 +27,7 @@ cfood:
          - BooleanElement
          - Definitions
          - Dict
+          - Date
          - JSONFile
          - CSVTableConverter
          - XLSXTableConverter

--- a/src/caoscrawler/converters.py
+++ b/src/caoscrawler/converters.py
--- a/src/caoscrawler/crawl.py
+++ b/src/caoscrawler/crawl.py
@@ -55,7 +55,7 @@ from caosdb.apiutils import (compare_entities, EntityMergeConflictError,
                             merge_entities)
 from caosdb.common.datatype import is_reference
-from .converters import Converter, DirectoryConverter
+from .converters import Converter, DirectoryConverter, ConverterValidationError
 from .identifiable import Identifiable
 from .identifiable_adapters import (IdentifiableAdapter,
                                    LocalStorageIdentifiableAdapter,
@@ -63,7 +63,8 @@ from .identifiable_adapters import (IdentifiableAdapter,
 from .identified_cache import IdentifiedCache
 from .macros import defmacro_constructor, macro_constructor
 from .stores import GeneralStore, RecordStore
-from .structure_elements import StructureElement, Directory
+from .structure_elements import StructureElement, Directory, NoneElement
+from .version import check_cfood_version
 logger = logging.getLogger(__name__)
@@ -255,12 +256,17 @@ class Crawler(object):
        if len(crawler_definitions) == 1:
            # Simple case, just one document:
            crawler_definition = crawler_definitions[0]
+            metadata = {}
        elif len(crawler_definitions) == 2:
+            metadata = crawler_definitions[0]["metadata"] if "metadata" in crawler_definitions[0] else {
+            }
            crawler_definition = crawler_definitions[1]
        else:
            raise RuntimeError(
                "Crawler definition must not contain more than two documents.")
+        check_cfood_version(metadata)
        # TODO: at this point this function can already load the cfood schema extensions
        #       from the crawler definition and add them to the yaml schema that will be
        #       tested in the next lines of code:
@@ -275,8 +281,8 @@ class Crawler(object):
                schema["cfood"]["$defs"]["converter"]["properties"]["type"]["enum"].append(
                    key)
        if len(crawler_definitions) == 2:
-            if "Converters" in crawler_definitions[0]["metadata"]:
+            if "Converters" in metadata:
-                for key in crawler_definitions[0]["metadata"]["Converters"]:
+                for key in metadata["Converters"]:
                    schema["cfood"]["$defs"]["converter"]["properties"]["type"]["enum"].append(
                        key)
@@ -300,6 +306,8 @@ class Crawler(object):
                    definition[key] = os.path.join(
                        os.path.dirname(definition_path), value)
                    if not os.path.isfile(definition[key]):
+                        # TODO(henrik) capture this in `crawler_main` similar to
+                        # `ConverterValidationError`.
                        raise FileNotFoundError(
                            f"Couldn't find validation file {definition[key]}")
            elif isinstance(value, dict):
@@ -339,6 +347,9 @@ class Crawler(object):
            "JSONFile": {
                "converter": "JSONFileConverter",
                "package": "caoscrawler.converters"},
+            "YAMLFile": {
+                "converter": "YAMLFileConverter",
+                "package": "caoscrawler.converters"},
            "CSVTableConverter": {
                "converter": "CSVTableConverter",
                "package": "caoscrawler.converters"},
@@ -363,6 +374,9 @@ class Crawler(object):
            "TextElement": {
                "converter": "TextElementConverter",
                "package": "caoscrawler.converters"},
+            "Date": {
+                "converter": "DateElementConverter",
+                "package": "caoscrawler.converters"},
            "DictIntegerElement": {
                "converter": "IntegerElementConverter",
                "package": "caoscrawler.converters"},
@@ -406,11 +420,16 @@ class Crawler(object):
            value["class"] = getattr(module, value["converter"])
        return converter_registry
-    def crawl_directory(self, dirname: str, crawler_definition_path: str):
+    def crawl_directory(self, dirname: str, crawler_definition_path: str,
+                        restricted_path: Optional[list[str]] = None):
        """ Crawl a single directory.
        Convenience function that starts the crawler (calls start_crawling)
        with a single directory as the StructureElement.
+        restricted_path: optional, list of strings
+                Traverse the data tree only along the given path. When the end of the given path
+                is reached, traverse the full tree as normal.
        """
        crawler_definition = self.load_definition(crawler_definition_path)
@@ -433,7 +452,9 @@ class Crawler(object):
        self.start_crawling(Directory(dir_structure_name,
                                      dirname),
                            crawler_definition,
-                            converter_registry)
+                            converter_registry,
+                            restricted_path=restricted_path
+                            )
    @staticmethod
    def initialize_converters(crawler_definition: dict, converter_registry: dict):
@@ -461,7 +482,8 @@ class Crawler(object):
    def start_crawling(self, items: Union[list[StructureElement], StructureElement],
                       crawler_definition: dict,
-                       converter_registry: dict):
+                       converter_registry: dict,
+                       restricted_path: Optional[list[str]] = None):
        """
        Start point of the crawler recursion.
@@ -473,6 +495,9 @@ class Crawler(object):
        crawler_definition : dict
             A dictionary representing the crawler definition, possibly from a yaml
             file.
+        restricted_path: optional, list of strings
+             Traverse the data tree only along the given path. When the end of the given path
+             is reached, traverse the full tree as normal.
        Returns
        -------
@@ -489,14 +514,18 @@ class Crawler(object):
            items = [items]
        self.run_id = uuid.uuid1()
-        local_converters = Crawler.initialize_converters(
+        local_converters = Crawler.initialize_converters(crawler_definition, converter_registry)
-            crawler_definition, converter_registry)
        # This recursive crawling procedure generates the update list:
        self.crawled_data: list[db.Record] = []
-        self._crawl(items, local_converters, self.generalStore,
+        self._crawl(
-                    self.recordStore, [], [])
+            items=items,
+            local_converters=local_converters,
+            generalStore=self.generalStore,
+            recordStore=self.recordStore,
+            structure_elements_path=[],
+            converters_path=[],
+            restricted_path=restricted_path)
        if self.debug:
            self.debug_converters = local_converters
@@ -546,14 +575,20 @@ class Crawler(object):
        return False
    @staticmethod
-    def create_flat_list(ent_list: list[db.Entity], flat: list[db.Entity]):
+    def create_flat_list(ent_list: list[db.Entity], flat: Optional[list[db.Entity]] = None):
        """
-        Recursively adds all properties contained in entities from ent_list to
+        Recursively adds entities and all their properties contained in ent_list to
-        the output list flat. Each element will only be added once to the list.
+        the output list flat.
        TODO: This function will be moved to pylib as it is also needed by the
              high level API.
        """
+        # Note: A set would be useful here, but we do not want a random order.
+        if flat is None:
+            flat = list()
+        for el in ent_list:
+            if el not in flat:
+                flat.append(el)
        for ent in ent_list:
            for p in ent.properties:
                # For lists append each element that is of type Entity to flat:
@@ -567,6 +602,7 @@ class Crawler(object):
                    if p.value not in flat:
                        flat.append(p.value)
                        Crawler.create_flat_list([p.value], flat)
+        return flat
    def _has_missing_object_in_references(self, ident: Identifiable, referencing_entities: list):
        """
@@ -736,9 +772,7 @@ class Crawler(object):
    def split_into_inserts_and_updates(self, ent_list: list[db.Entity]):
        to_be_inserted: list[db.Entity] = []
        to_be_updated: list[db.Entity] = []
-        flat = list(ent_list)
+        flat = Crawler.create_flat_list(ent_list)
-        # assure all entities are direct members TODO Can this be removed at some point?Check only?
-        Crawler.create_flat_list(ent_list, flat)
        # TODO: can the following be removed at some point
        for ent in flat:
@@ -1142,11 +1176,14 @@ ____________________\n""".format(i + 1, len(pending_changes)) + str(el[3]))
        with open(filename, "w") as f:
            f.write(yaml.dump(paths, sort_keys=False))
-    def _crawl(self, items: list[StructureElement],
+    def _crawl(self,
+               items: list[StructureElement],
               local_converters: list[Converter],
               generalStore: GeneralStore,
               recordStore: RecordStore,
-               structure_elements_path: list[str], converters_path: list[str]):
+               structure_elements_path: list[str],
+               converters_path: list[str],
+               restricted_path: Optional[list[str]] = None):
        """
        Crawl a list of StructureElements and apply any matching converters.
@@ -1155,16 +1192,31 @@ ____________________\n""".format(i + 1, len(pending_changes)) + str(el[3]))
                            treating structure elements. A locally defined converter could be
                            one that is only valid for a specific subtree of the originally
                            cralwed StructureElement structure.
-        generalStore and recordStore: This recursion of the crawl function should only operate on copies of the
+        generalStore and recordStore: This recursion of the crawl function should only operate on
-                            global stores of the Crawler object.
+                                      copies of the global stores of the Crawler object.
+        restricted_path: optional, list of strings, traverse the data tree only along the given
+                         path. For example, when a directory contains files a, b and c and b is
+                         given in restricted_path, a and c will be ignroed by the crawler.
+                         When the end of the given path is reached, traverse the full tree as
+                         normal. The first element of the list provided by restricted_path should
+                         be the name of the StructureElement at this level, i.e. denoting the
+                         respective element in the items argument.
        """
+        # This path_found variable stores wether the path given by restricted_path was found in the
+        # data tree
+        path_found = False
+        if restricted_path is not None and len(restricted_path) == 0:
+            restricted_path = None
        for element in items:
            for converter in local_converters:
                # type is something like "matches files", replace isinstance with "type_matches"
                # match function tests regexp for example
-                if (converter.typecheck(element) and
+                if (converter.typecheck(element) and (
-                        converter.match(element) is not None):
+                        restricted_path is None or element.name == restricted_path[0])
+                        and converter.match(element) is not None):
+                    path_found = True
                    generalStore_copy = generalStore.create_scoped_copy()
                    recordStore_copy = recordStore.create_scoped_copy()
@@ -1179,8 +1231,8 @@ ____________________\n""".format(i + 1, len(pending_changes)) + str(el[3]))
                    keys_modified = converter.create_records(
                        generalStore_copy, recordStore_copy, element)
-                    children = converter.create_children(
+                    children = converter.create_children(generalStore_copy, element)
-                        generalStore_copy, element)
                    if self.debug:
                        # add provenance information for each varaible
                        self.debug_tree[str(element)] = (
@@ -1205,7 +1257,12 @@ ____________________\n""".format(i + 1, len(pending_changes)) + str(el[3]))
                    self._crawl(children, converter.converters,
                                generalStore_copy, recordStore_copy,
                                structure_elements_path + [element.get_name()],
-                                converters_path + [converter.name])
+                                converters_path + [converter.name],
+                                restricted_path[1:] if restricted_path is not None else None)
+        if restricted_path and not path_found:
+            raise RuntimeError("A 'restricted_path' argument was given that is not contained in "
+                               "the data tree")
        # if the crawler is running out of scope, copy all records in
        # the recordStore, that were created in this scope
        # to the general update container.
@@ -1236,6 +1293,7 @@ def crawler_main(crawled_directory_path: str,
                 prefix: str = "",
                 securityMode: SecurityMode = SecurityMode.UPDATE,
                 unique_names=True,
+                 restricted_path: Optional[list[str]] = None
                 ):
    """
@@ -1259,6 +1317,9 @@ def crawler_main(crawled_directory_path: str,
        securityMode of Crawler
    unique_names : bool
        whether or not to update or insert entities inspite of name conflicts
+    restricted_path: optional, list of strings
+            Traverse the data tree only along the given path. When the end of the given path
+            is reached, traverse the full tree as normal.
    Returns
    -------
@@ -1266,8 +1327,12 @@ def crawler_main(crawled_directory_path: str,
        0 if successful
    """
    crawler = Crawler(debug=debug, securityMode=securityMode)
-    crawler.crawl_directory(crawled_directory_path, cfood_file_name)
+    try:
-    if provenance_file is not None:
+        crawler.crawl_directory(crawled_directory_path, cfood_file_name, restricted_path)
+    except ConverterValidationError as err:
+        print(err)
+        return 1
+    if provenance_file is not None and debug:
        crawler.save_debug_data(provenance_file)
    if identifiables_definition_file is not None:
@@ -1328,6 +1393,15 @@ def parse_args():
                                     formatter_class=RawTextHelpFormatter)
    parser.add_argument("cfood_file_name",
                        help="Path name of the cfood yaml file to be used.")
+    mg = parser.add_mutually_exclusive_group()
+    mg.add_argument("-r", "--restrict", nargs="*",
+                    help="Restrict the crawling to the subtree at the end of the given path."
+                    "I.e. for each level that is given the crawler only treats the element "
+                    "with the given name.")
+    mg.add_argument("--restrict-path", help="same as restrict; instead of a list, this takes a "
+                    "single string that is interpreded as file system path. Note that a trailing"
+                    "separator (e.g. '/') will be ignored. Use --restrict if you need to have "
+                    "empty strings.")
    parser.add_argument("--provenance", required=False,
                        help="Path name of the provenance yaml file. "
                        "This file will only be generated if this option is set.")
@@ -1359,6 +1433,15 @@ def parse_args():
    return parser.parse_args()
+def split_restricted_path(path):
+    elements = []
+    while path != "/":
+        path, el = os.path.split(path)
+        if el != "":
+            elements.insert(0, el)
+    return elements
 def main():
    args = parse_args()
@@ -1374,6 +1457,12 @@ def main():
    if args.add_cwd_to_path:
        sys.path.append(os.path.abspath("."))
+    restricted_path = None
+    if args.restrict_path:
+        restricted_path = split_restricted_path(args.restrict_path)
+    if args.restrict:
+        restricted_path = args.restrict
    sys.exit(crawler_main(
        crawled_directory_path=args.crawled_directory_path,
        cfood_file_name=args.cfood_file_name,
@@ -1386,6 +1475,7 @@ def main():
                      "insert": SecurityMode.INSERT,
                      "update": SecurityMode.UPDATE}[args.security_mode],
        unique_names=args.unique_names,
+        restricted_path=restricted_path
    ))

--- a/src/caoscrawler/identifiable.py
+++ b/src/caoscrawler/identifiable.py
@@ -62,6 +62,8 @@ class Identifiable():
        self.path = path
        self.record_type = record_type
        self.name = name
+        if name is "":
+            self.name = None
        self.properties: dict = {}
        if properties is not None:
            self.properties = properties

--- a/src/caoscrawler/identifiable_adapters.py
+++ b/src/caoscrawler/identifiable_adapters.py
@@ -27,6 +27,7 @@ from __future__ import annotations
 import yaml
 from datetime import datetime
+from typing import Any
 from .identifiable import Identifiable
 import caosdb as db
 import logging
@@ -35,14 +36,14 @@ from .utils import has_parent
 logger = logging.getLogger(__name__)
-def convert_value(value):
+def convert_value(value: Any):
    """ Returns a string representation of the value that is suitable
    to be used in the query
    looking for the identified record.
    Parameters
    ----------
-    value : The property of which the value shall be returned.
+    value : Any type, the value that shall be returned and potentially converted.
    Returns
    -------
@@ -54,11 +55,13 @@ def convert_value(value):
        return str(value.id)
    elif isinstance(value, datetime):
        return value.isoformat()
-    elif type(value) == str:
+    elif isinstance(value, bool):
+        return str(value).upper()
+    elif isinstance(value, str):
        # replace single quotes, otherwise they may break the queries
        return value.replace("\'", "\\'")
    else:
-        return f"{value}"
+        return str(value)
 class IdentifiableAdapter(metaclass=ABCMeta):
@@ -97,7 +100,7 @@ class IdentifiableAdapter(metaclass=ABCMeta):
        whether the required record already exists.
        """
-        query_string = "FIND Record "
+        query_string = "FIND RECORD "
        if ident.record_type is not None:
            query_string += ident.record_type
        for ref in ident.backrefs:

--- a/src/caoscrawler/macros/macro_yaml_object.py
+++ b/src/caoscrawler/macros/macro_yaml_object.py
@@ -135,6 +135,7 @@ def macro_constructor(loader, node):
                            raise RuntimeError("params type not supported")
                    else:
                        raise RuntimeError("params type must not be None")
+                    params = substitute_dict(params, params)
                    definition = substitute_dict(macro.definition, params)
                    res.update(definition)
            else:
@@ -146,6 +147,7 @@ def macro_constructor(loader, node):
                        params.update(params_setter)
                    else:
                        raise RuntimeError("params type not supported")
+                params = substitute_dict(params, params)
                definition = substitute_dict(macro.definition, params)
                res.update(definition)
        else:

--- a/src/caoscrawler/structure_elements.py
+++ b/src/caoscrawler/structure_elements.py
@@ -56,6 +56,10 @@ class FileSystemStructureElement(StructureElement):
        return "{}: {}, {}".format(class_name_short, self.name, self.path)
+class NoneElement(StructureElement):
+    pass
 class Directory(FileSystemStructureElement):
    pass

--- a/src/caoscrawler/version.py
+++ b/src/caoscrawler/version.py
+#
+# This file is a part of the CaosDB Project.
+#
+# Copyright (C) 2022 Indiscale GmbH <info@indiscale.com>
+# Copyright (C) 2022 Florian Spreckelsen <f.spreckelsen@indiscale.com>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as
+# published by the Free Software Foundation, either version 3 of the
+# License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
+#
+try:
+    from importlib import metadata as importlib_metadata
+except ImportError:  # Python<3.8 dowesn"t support this so use
+    import importlib_metadata
+from packaging.version import parse as parse_version
+from warnings import warn
+# Read in version of locally installed caoscrawler package
+version = importlib_metadata.version("caoscrawler")
+class CfoodRequiredVersionError(RuntimeError):
+    """The installed crawler version is older than the version specified in the
+    cfood's metadata.
+    """
+def check_cfood_version(metadata: dict):
+    if not metadata or "crawler-version" not in metadata:
+        msg = """
+No crawler version specified in cfood definition, so there is now guarantee that
+the cfood definition matches the installed crawler version.
+Specifying a version is highly recommended to ensure that the definition works
+as expected with the installed version of the crawler.
+        """
+        warn(msg, UserWarning)
+        return
+    installed_version = parse_version(version)
+    cfood_version = parse_version(metadata["crawler-version"])
+    if cfood_version > installed_version:
+        msg = f"""
+Your cfood definition requires a newer version of the CaosDB crawler. Please
+update the crawler to the required version.
+Crawler version specified in cfood: {cfood_version}
+Crawler version installed on your system: {installed_version}
+        """
+        raise CfoodRequiredVersionError(msg)
+    elif cfood_version < installed_version:
+        # only warn if major or minor of installed version are newer than
+        # specified in cfood
+        if (cfood_version.major < installed_version.major) or (cfood_version.minor < installed_version.minor):
+            msg = f"""
+The cfood was written for a previous crawler version. Running the crawler in a
+newer version than specified in the cfood definition may lead to unwanted or
+unexpected behavior. Please visit the CHANGELOG
+(https://gitlab.com/caosdb/caosdb-crawler/-/blob/main/CHANGELOG.md) and check
+for any relevant changes.
+Crawler version specified in cfood: {cfood_version}
+Crawler version installed on your system: {installed_version}
+            """
+            warn(msg, UserWarning)
+            return
+    # At this point, the version is either equal or the installed crawler
+    # version is newer just by an increase in the patch version, so still
+    # compatible. We can safely ...
+    return
--- a/src/doc/cfood.rst
+++ b/src/doc/cfood.rst
@@ -16,6 +16,9 @@ document together with the metadata and :doc:`macro<macros>` definitions (see :r
 If metadata and macro definitions are provided, there **must** be a second document preceeding the
 converter tree specification, including these definitions.
+It is highly recommended to specify the version of the CaosDB crawler for which
+the cfood is written in the metadata section, see :ref:`below<example_3>`.
 Examples
 ++++++++
@@ -69,6 +72,7 @@ two custom converters in the second document (**not recommended**, see the recom
   metadata:
     name: Datascience CFood
     description: CFood for data from the local data science work group
+     crawler-version: 0.2.1
     macros:
     - !defmacro
       name: SimulationDatasetFile
@@ -108,6 +112,7 @@ The **recommended way** of defining metadata, custom converters, macros and the
   metadata:
     name: Datascience CFood
     description: CFood for data from the local data science work group
+     crawler-version: 0.2.1
     macros:
     - !defmacro
       name: SimulationDatasetFile

--- a/src/doc/conf.py
+++ b/src/doc/conf.py
@@ -33,10 +33,10 @@ copyright = '2021, MPIDS'
 author = 'Alexander Schlemmer'
 # The short X.Y version
-version = '0.2.1'
+version = '0.3.1'
 # The full version, including alpha/beta/rc tags
 # release = '0.5.2-rc2'
-release = '0.2.1-dev'
+release = '0.3.1-dev'
 # -- General configuration ---------------------------------------------------

--- a/src/doc/converters.rst
+++ b/src/doc/converters.rst
@@ -77,7 +77,7 @@ Reads a YAML header from Markdown files (if such a header exists) and creates
 children elements according to the structure of the header.
 DictElement Converter
-==============
+=====================
 Creates a child StructureElement for each key in the dictionary.
 Typical Subtree converters
@@ -483,3 +483,22 @@ Let's formulate that using `create_records` (again, `dir_name` is constant here)
  keys_modified = create_records(values, records,
                                 record_def)
+Debugging
+=========
+You can add the key `debug_match` to the definition of a Converter in order to create debugging
+output for the match step. The following snippet illustrates this:
+.. code-block:: yaml
+  DirConverter:
+    type: Directory
+    match: (?P<dir_name>.*)
+    debug_match: True
+    records:
+      Project:
+        identifier: project_name
+Whenever this Converter tries to match a StructureElement, it logs what was tried to macht against
+what and what the result was.