diff --git a/src/doc/converters.rst b/src/doc/converters.rst index f59e6d3dff0a1f75dc4e0e5bcbbee0b4ceb7e81d..9f38ffa7240604e4e7af1801b42e74888b00d3a7 100644 --- a/src/doc/converters.rst +++ b/src/doc/converters.rst @@ -478,6 +478,49 @@ importantly, this record stores the internal path of this array within the HDF5 file in a text property, the name of which can be configured with the ``internal_path_property_name`` option which defaults to ``internal_hdf5_path``. +XMLFileConverter +---------------- + +This is a converter that loads an XML file and creates an XMLElement containing the +root element of the XML tree. It can be matched in the subtree using the XMLTagConverter. + +XMLTagConverter +--------------- + +The XMLTagConverter is a generic converter for XMLElements with the following main features: +- It allows to match a combination of tag name, attribute names and text contents using the keys: + - `match_tag`: regexp, default empty string + - `match_attrib`: dictionary of key-regexps and value-regexp pairs. Each key matches an attribute name and the corresponding value matches its attribute value. + - `match_text`: regexp, default empty string +- It allows to traverse the tree using XPath (using Python lxml's xpath functions): + - The key `xpath` is used to set the xpath expression and has a default of `child::*`. Its default would generate just the list of sub nodes of the current node. + The result of the xpath expression is used to generate structure elements as children. + It furthermore uses the keys `tags_as_children`, `attribs_as_children` and `text_as_children` to decide + which information from the found nodes will be used as children: + - `tags_as_children`: (default `true`) For each xml tag element found by the xpath expression, generate one XMLTag structure element. Its name is the full path + to the tag using the function `getelementpath` from `lxml`. + - `attribs_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLAttributeNode structure element for each of its attributes. + The name of the respective attribute node has the form: `<full path of the tag> @ <name of the attribute>` + **Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes. + - `text_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLTextNode structure element containing the text content + of the tag element. Note that in case of multiple text elements, only the first one is added. + The name of the respective attribute node has the form: `<full path of the tag> /text()` + to the tag using the function `getelementpath` from `lxml`. + **Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes. + +Namespaces +********** +The default is to take the namespace map from the current node and use it in xpath queries. +Because default namespaces cannot be handled by xpath, it is possible to remap the default namespace +using the key `default_namespace`. +The key `nsmap` can be used to define additional nsmap entries. + +XMLTextNodeConverter +-------------------- + +In the future, this converter can be used to match XMLTextNodes that are generated by the XMLTagConverter. + + Custom Converters +++++++++++++++++