Design of the XML converter
Suggestion:
Consider the following example XML:
<a href="test1" alt="no link">
test <img src="test2"/>
</a>
There should be some flexibility in matching and processing the contained tags/texts/attributes, therefore I propose the following design of the converter paramters:
xmlfile:
type: XMLFileConverter
match: ^.*\.xml$
subtree:
anchor:
type: XMLTagConverter
match_tag: a
match_attrib: # default is the empty dictionary
"(?P<ref>(href|url))": "text(?P<number>[0-9])" # either the "href" or the "url" attribute must be set
alt: (.+) # this attribute must be present and contain at least one character
match_text: .* # allow any text, also empty (this is the default)
# _*_ marks the default:
attribs_as_children: true # true / _false_
text_as_children: true # true / _false_
tags_as_children: true # _true_ / false
subtree:
text: # this would be created by the text_as_children-flag
type: TextElementConverter # Unclear, is this correct? What would be name/value here?
match: test
alt: # this would be created by the attribs_as_children-flag
type: TextElementConverter
match_name: alt
match_value: ^(?P<text>.*)$
img:
type: XMLTagConverter
# (...)
Edited by Alexander Schlemmer