Skip to content

Design of the XML converter

Suggestion:

Consider the following example XML:

<a href="test1" alt="no link">
  test <img src="test2"/>
</a>

There should be some flexibility in matching and processing the contained tags/texts/attributes, therefore I propose the following design of the converter paramters:

xmlfile:
  type: XMLFileConverter
  match: ^.*\.xml$
  subtree:
    anchor:
      type: XMLTagConverter
      match_tag: a
      match_attrib:  # default is the empty dictionary
        "(?P<ref>(href|url))": "text(?P<number>[0-9])"  # either the "href" or the "url" attribute must be set
        alt: (.+)  # this attribute must be present and contain at least one character
      match_text: .*  # allow any text, also empty (this is the default)

      # _*_ marks the default:
      attribs_as_children: true  # true / _false_
      text_as_children: true  # true / _false_
      tags_as_children: true  # _true_ / false

      subtree:
        text:  # this would be created by the text_as_children-flag
          type: TextElementConverter  # Unclear, is this correct? What would be name/value here?
          match: test
        alt:  # this would be created by the attribs_as_children-flag
          type: TextElementConverter
          match_name: alt
          match_value: ^(?P<text>.*)$
        img:
          type: XMLTagConverter
          # (...)
Edited by Alexander Schlemmer