From ae02c3709db675351003f542c6dfd78f00be408f Mon Sep 17 00:00:00 2001 From: Alexander Schlemmer <alexander@mail-schlemmer.de> Date: Thu, 29 Sep 2022 13:15:34 +0200 Subject: [PATCH] DOC: cleaned outdated documentation --- src/doc/cfood.rst | 13 ++++++ src/doc/concepts.rst | 89 +----------------------------------------- src/doc/converters.rst | 58 +++++++++++++++++++++++++++ 3 files changed, 73 insertions(+), 87 deletions(-) diff --git a/src/doc/cfood.rst b/src/doc/cfood.rst index 9b0701f7..677cadc5 100644 --- a/src/doc/cfood.rst +++ b/src/doc/cfood.rst @@ -134,3 +134,16 @@ The **recommended way** of defining metadata, custom converters, macros and the match: DataAnalysis # (...) + +List Mode +--------- + +Specifying values of properties can make use of two special characters, in order to automatically +create lists or multi properties instead of single values: + +.. code-block:: yaml + + Experiment1: + Measurement: +Measurement <- Element in List (list is cleared before run) + *Measurement <- Multi Property (properties are removed before run) + Measurement <- Overwrite diff --git a/src/doc/concepts.rst b/src/doc/concepts.rst index cf1a1e09..c0f21cba 100644 --- a/src/doc/concepts.rst +++ b/src/doc/concepts.rst @@ -20,97 +20,12 @@ Converters treat StructureElements and thereby create the StructureElement that are the children of the treated StructureElement. Converters therefore create the above named tree. The definition of a Converter also contains what Converters shall be used to treat the generated child-StructureElements. The -definition is there a tree itself. (Question: Should there be global Converters -that are always checked when treating a StructureElement? Should Converters be -associated with generated child-StructureElements? Currently, all children are -created and checked against all Converters. It could be that one would like to -check file-StructureElements against one set of Converters and -directory-StructureElements against another) +definition is therefore a tree itself. -Each StructureElement in the tree has a set of data values, i.e a dictionary of -key value pairs. -Some of those values are set due to the kind of StructureElement. For example, -a file could have the file name as such a key value pair: 'filename': <sth>. -Converters may define additional functions that create further values. For -example, a regular expresion could be used to get a date from a file name. +See `:doc:converters<converters>` for details. - -A converter is defined via a yml file or part of it. The definition states -what kind of StructureElement it treats (typically one). -Also, it defines how children of the current StructureElement are -created and what Converters shall be used to treat those. - -The yaml definition looks like the following: - -TODO: outdated, see cfood-schema.yml - -.. code-block:: yaml - - converter-name: - type: <StructureElement Type> - match: ".*" - records: - Experiment1: - parents: - - Experiment - - Blablabla - date: $DATUM - (...) - Experiment2: - parents: - - Experiment - subtree: - (...) - - - records: - Measurement: <- wird automatisch ein value im valueStore - run_number: 25 - Experiment1: - Measurement: +Measurement <- Element in List (list is cleared before run) - *Measurement <- Multi Property (properties are removed before run) - Measurement <- Overwrite - -UPDATE-Stage prüft ob es z.B. Gleichheit zwischen Listen gibt (die dadurch definiert sein -kann, dass alle Elemente vorhanden, aber nicht zwingend in der richtigen Reihenfolge sind) -evtl. brauchen wir das nicht, weil crawler eh schon deterministisch ist. - -The converter-name is a description of what it represents (e.g. -'experiment-folder') and is used as identifier. - -The type restricts what kind of StructureElements are treated. -The match is by default a regular expression, that is matche against the -name of StructureElements. Discussion: StructureElements might not have a -name (e.g. a dict) or should a name be created artificially if necessary -(e.g. "root-dict")? It might make sense to allow keywords like "always" and -other kinds of checks. For example a dictionary could be checked against a -json-schema definition. - -recordtypes is a list of definitions that define the semantic structure -(see details below). - -valuegenerators allow to provide additional functionality that creates -data values in addition to the ones given by default via the -StructureElement. This can be for example a match group of a regular -expression applied to the filename. -It should be possible to access the values of parent nodes. For example, -the name of a parent node could be accessed with $converter-name.name. -Discussion: This can introduce conflicts, if the key <converver-name> -already exists. An alternative would be to identify those lookups. E.g. -$$converter-name.name (2x$). - -childrengenerators denotes how StructureElements shall be created that are -children of the current one. - -subtree contains a list of Converter defnitions that look like the one -described here. - -those keywords should be allowed but not required. I.e. if no -valuegenerators shall be defined, the keyword may be omitted. - - Relevant sources in: src/converters.py diff --git a/src/doc/converters.rst b/src/doc/converters.rst index 99c0bbff..15756adc 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -1,6 +1,64 @@ Converters )))))))))) +Converters treat StructureElements and thereby create the StructureElement that +are the children of the treated StructureElement. Converters therefore create +the tree of structure elements. The definition of a Converter also contains what +Converters shall be used to treat the generated child-StructureElements. The +definition is therefore a tree itself. + +Each StructureElement in the tree has a set of data values, i.e a dictionary of +key value pairs. +Some of those values are set due to the kind of StructureElement. For example, +a file could have the file name as such a key value pair: 'filename': <sth>. +Converters may define additional functions that create further values. For +example, a regular expresion could be used to get a date from a file name. + + + + +A converter is defined via a yml file or part of it. The definition states +what kind of StructureElement it treats (typically one). +Also, it defines how children of the current StructureElement are +created and what Converters shall be used to treat those. + +The yaml definition looks like the following: + +TODO: outdated, see cfood-schema.yml + +.. code-block:: yaml + + <NodeName>: + type: <ConverterName> + match: ".*" + records: + Experiment1: + parents: + - Experiment + - Blablabla + date: $DATUM + (...) + Experiment2: + parents: + - Experiment + subtree: + (...) + +The **<NodeName>** is a description of what it represents (e.g. +'experiment-folder') and is used as identifier. + +**<type>** selects the converter that is going to be matched against the current structure +element. If the structure element matches (this is a combination of a typecheck and a detailed +match, see :py:class:`~caoscrawler.converters.Converter` for details) the converter is used +to generate records (see :py:meth:`~caoscrawler.converters.Converter.create_records`) and to possibly process a subtree, as defined by the function :func:`caoscrawler.converters.create_children`. + +**records** is a dict of definitions that define the semantic structure +(see details below). + +Subtree contains a list of Converter defnitions that look like the one +described here. + + Standard Converters +++++++++++++++++++ -- GitLab