Skip to content
Snippets Groups Projects

XML Converter

Merged Alexander Schlemmer requested to merge f-xml-converter into dev
Compare and Show latest version
2 files
+ 48
1
Compare changes
  • Side-by-side
  • Inline
Files
2
+ 43
0
@@ -478,6 +478,49 @@ importantly, this record stores the internal path of this array within the HDF5
file in a text property, the name of which can be configured with the
``internal_path_property_name`` option which defaults to ``internal_hdf5_path``.
XMLFileConverter
----------------
This is a converter that loads an XML file and creates an XMLElement containing the
root element of the XML tree. It can be matched in the subtree using the XMLTagConverter.
XMLTagConverter
---------------
The XMLTagConverter is a generic converter for XMLElements with the following main features:
- It allows to match a combination of tag name, attribute names and text contents using the keys:
- `match_tag`: regexp, default empty string
- `match_attrib`: dictionary of key-regexps and value-regexp pairs. Each key matches an attribute name and the corresponding value matches its attribute value.
- `match_text`: regexp, default empty string
- It allows to traverse the tree using XPath (using Python lxml's xpath functions):
- The key `xpath` is used to set the xpath expression and has a default of `child::*`. Its default would generate just the list of sub nodes of the current node.
The result of the xpath expression is used to generate structure elements as children.
It furthermore uses the keys `tags_as_children`, `attribs_as_children` and `text_as_children` to decide
which information from the found nodes will be used as children:
- `tags_as_children`: (default `true`) For each xml tag element found by the xpath expression, generate one XMLTag structure element. Its name is the full path
to the tag using the function `getelementpath` from `lxml`.
- `attribs_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLAttributeNode structure element for each of its attributes.
The name of the respective attribute node has the form: `<full path of the tag> @ <name of the attribute>`
**Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes.
- `text_as_children`: (default `false`) For each xml tag element found by the xpath expression, generate one XMLTextNode structure element containing the text content
of the tag element. Note that in case of multiple text elements, only the first one is added.
The name of the respective attribute node has the form: `<full path of the tag> /text()`
to the tag using the function `getelementpath` from `lxml`.
**Please note:** Currently, there is no converter implemented that can match XMLAttributeNodes.
Namespaces
**********
The default is to take the namespace map from the current node and use it in xpath queries.
Because default namespaces cannot be handled by xpath, it is possible to remap the default namespace
using the key `default_namespace`.
The key `nsmap` can be used to define additional nsmap entries.
XMLTextNodeConverter
--------------------
In the future, this converter can be used to match XMLTextNodes that are generated by the XMLTagConverter.
Custom Converters
+++++++++++++++++
Loading