XML: ElementTree (etree
)¶
SAX and DOM¶
SAX
|
DOM: “Document Object Model”
|
ElementTree¶
xml.etree
: Python specific ⟶ absolutelycomfortable
Seamless integration in Python (⟶ iteration)
A document is a tree, and trees are lists of lists
XML attributes represented as dictionaries
⟶ simple!
A Very Simple Document¶
from xml.etree.ElementTree import Element
element = Element("root")
child = Element("child")
element.append(child)
element = Element("root")
SubElement(element, "child")
<root>
<child />
</root>
Attributes¶
XML elements have attributes
Python’s XML elements have the
attrib
dictionary
element = Element("root")
child = SubElement(element, "child")
child.attrib['age'] = '15'
child = SubElement(element, "child")
child.attrib['age'] = '17'
<root>
<child age="15" />
<child age="17" />
</root>
Text (1)¶
In XML documents, free text is permitted …
Inside one element
After one element, but before the start of another element
Accordingly, Python elements have members …
element.text
element.tail
No text ⟶
None
Text (2)¶
element = Element("root")
child = SubElement(element, "child")
child.text = 'Text'
child.tail = 'Tail'
<root><child>Text</child>Tail</root>
Careful with indentation
Whitespace, linefeed etc. is text, no matter what
str.strip()
may be helpful
Writing XML Documents¶
We have created
Element
objectsAdded child elements
Now how do we create XML?
Wrap into
ElementTree
- a helper
from xml.etree.ElementTree import ElementTree
tree = ElementTree(element)
tree.write(sys.stdout) # oder file(..., 'w')
Output is very tight
Text is preserved as-is
Pretty output would be incorrect
Linefeed and indentation is text
Reading XML Documents¶
from xml.etree.ElementTree import parse
tree = parse(sys.stdin)
for child in tree.getroot():
age = child.attrib.get('age')
if age is not None:
print age
if child.text is not None:
print child.text