diff --git a/src/doc/converters.rst b/src/doc/converters.rst index 9f38ffa7240604e4e7af1801b42e74888b00d3a7..84acf4fe7f5184029705fca39f358f1f09f58c08 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -378,6 +378,73 @@ special treatment to all Records of a specific type. ``referenced_record_callback`` is applied **after** the properties from the dictionary have been applied as explained above. +XML Converters +============== + +There are the following converters for XML content: + + +XMLFileConverter +---------------- + +This is a converter that loads an XML file and creates an XMLElement containing the +root element of the XML tree. It can be matched in the subtree using the XMLTagConverter. + +XMLTagConverter +--------------- + +The XMLTagConverter is a generic converter for XMLElements with the following main features: + +- It allows to match a combination of tag name, attribute names and text contents using the keys: + + - ``match_tag``: regexp, default empty string + - ``match_attrib``: dictionary of key-regexps and value-regexp + pairs. Each key matches an attribute name and the corresponding + value matches its attribute value. + - ``match_text``: regexp, default empty string +- It allows to traverse the tree using XPath (using Python lxml's xpath functions): + + - The key ``xpath`` is used to set the xpath expression and has a + default of ``child::*``. Its default would generate just the list of + sub nodes of the current node. The result of the xpath expression + is used to generate structure elements as children. It furthermore + uses the keys ``tags_as_children``, ``attribs_as_children`` and + ``text_as_children`` to decide which information from the found + nodes will be used as children: + - ``tags_as_children``: (default ``true``) For each xml tag element + found by the xpath expression, generate one XMLTag structure + element. Its name is the full path to the tag using the function + ``getelementpath`` from ``lxml``. + - ``attribs_as_children``: (default ``false``) For each xml tag element + found by the xpath expression, generate one XMLAttributeNode + structure element for each of its attributes. The name of the + respective attribute node has the form: ``<full path of the tag> @ + <name of the attribute>`` **Please note:** Currently, there is no + converter implemented that can match XMLAttributeNodes. + - ``text_as_children``: (default ``false``) For each xml tag element + found by the xpath expression, generate one XMLTextNode structure + element containing the text content of the tag element. Note that + in case of multiple text elements, only the first one is + added. The name of the respective attribute node has the form: + ``<full path of the tag> /text()`` to the tag using the function + ``getelementpath`` from ``lxml``. **Please note:** Currently, there is + no converter implemented that can match XMLAttributeNodes. + +Namespaces +********** + +The default is to take the namespace map from the current node and use +it in xpath queries. Because default namespaces cannot be handled by +xpath, it is possible to remap the default namespace using the key +``default_namespace``. The key ``nsmap`` can be used to define +additional nsmap entries. + +XMLTextNodeConverter +-------------------- + +In the future, this converter can be used to match XMLTextNodes that +are generated by the XMLTagConverter. + Further converters ++++++++++++++++++ @@ -478,49 +545,6 @@ importantly, this record stores the internal path of this array within the HDF5 file in a text property, the name of which can be configured with the ``internal_path_property_name`` option which defaults to ``internal_hdf5_path``. -XMLFileConverter ----------------- - -This is a converter that loads an XML file and creates an XMLElement containing the -root element of the XML tree. It can be matched in the subtree using the XMLTagConverter. - -XMLTagConverter ---------------- - -The XMLTagConverter is a generic converter for XMLElements with the following main features: -- It allows to match a combination of tag name, attribute names and text contents using the keys: - - `match_tag`: regexp, default empty string - - `match_attrib`: dictionary of key-regexps and value-regexp pairs. Each key matches an attribute name and the corresponding value matches its attribute value. - - `match_text`: regexp, default empty string -- It allows to traverse the tree using XPath (using Python lxml's xpath functions): - - The key `xpath` is used to set the xpath expression and has a default of `child::*`. Its default would generate just the list of sub nodes of the current node. - The result of the xpath expression is used to generate structure elements as children. - It furthermore uses the keys `tags_as_children`, `attribs_as_children` and `text_as_children` to decide - which information from the found nodes will be used as children: - - `tags_as_children`: (default `true`) For each xml tag element found by the xpath expression, generate one XMLTag structure element. Its name is the full path - to the tag using the function `getelementpath` from `lxml`. - - `attribs_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLAttributeNode structure element for each of its attributes. - The name of the respective attribute node has the form: `<full path of the tag> @ <name of the attribute>` - **Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes. - - `text_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLTextNode structure element containing the text content - of the tag element. Note that in case of multiple text elements, only the first one is added. - The name of the respective attribute node has the form: `<full path of the tag> /text()` - to the tag using the function `getelementpath` from `lxml`. - **Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes. - -Namespaces -********** -The default is to take the namespace map from the current node and use it in xpath queries. -Because default namespaces cannot be handled by xpath, it is possible to remap the default namespace -using the key `default_namespace`. -The key `nsmap` can be used to define additional nsmap entries. - -XMLTextNodeConverter --------------------- - -In the future, this converter can be used to match XMLTextNodes that are generated by the XMLTagConverter. - - Custom Converters +++++++++++++++++