{% img center /img/python-xml.jpg 'xml' %}
I would like to continue with another small python example. As in the previous post,this is more like a "note to self" thing than a educational post (this is not stackoverflow), but anyway, I guess it could be handy for someone(it has been for me...)
This time, I have been testing two xml submodules availables in Python 2.7 XML package:
- xml.dom: the DOM API definition
- xml.etree.ElementTree: the ElementTree API, a simple and lightweight XML processor
Both of them are pretty easy to use, and I haven't found any real difference in time execution (at least for basic filters) between both submodules.
I will use this xml file as an example:
<?xml version="1.0"?>
<data>
<node>
<attribute>Front End</attribute>
<res_ids>100</res_ids>
<nEName>BALDR</nEName>
<ipList>192.168.0.5</ipList>
<ipv6List>fe80::1:f6f1:fe01:12</ipv6List>
<so>Ubuntu</so>
<kernel>3.0.93</kernel>
</node>
<node>
<attribute>Web</attribute>
<res_ids>12</res_ids>
<nEName>THOR</nEName>
<ipList>192.168.0.20</ipList>
<ipv6List>fe80::1:f6f1:fe01:12</ipv6List>
<so>Ubuntu</so>
<kernel>3.0.93</kernel>
</node>
<node>
<attribute>Storage</attribute>
<res_ids>200</res_ids>
<nEName>VALI</nEName>
<ipList>192.168.0.10</ipList>
<ipv6List>fe80::1:f6f1:fe01:12</ipv6List>
<so>Ubuntu</so>
<kernel>3.0.93</kernel>
</node>
<node>
<attribute>DB</attribute>
<res_ids>230</res_ids>
<nEName>LOKI</nEName>
<ipList>192.168.0.110</ipList>
<ipv6List>fe80::1:f6f1:fe01:12</ipv6List>
<so>Fedora</so>
<kernel>3.0.93</kernel>
</node>
<node>
<attribute>Backup</attribute>
<res_ids>300</res_ids>
<nEName>OTHER</nEName>
<ipList>192.168.0.103</ipList>
<ipv6List>fe80::1:f6f1:fe01:12</ipv6List>
<so>Debian</so>
<kernel>3.0.93</kernel>
</node>
</data>
In the tree, we can see that we have one "data" object containing several "nodes", each one with 7 "attributes".
Now let's extract the nodes and all the attributes using xml.dom:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #!/usr/bin/env python
from xml.dom import minidom
def parse_file(filename):
xmldoc = minidom.parse(filename)
for node in xmldoc.getElementsByTagName('node'):
print str(node.getElementsByTagName("attribute")[0].firstChild.nodeValue),
print str(node.getElementsByTagName("res_ids")[0].firstChild.nodeValue),
print str(node.getElementsByTagName("nEName")[0].firstChild.nodeValue),
print str(node.getElementsByTagName("ipList")[0].firstChild.nodeValue)
print str(node.getElementsByTagName("ipv6List")[0].firstChild.nodeValue)
print str(node.getElementsByTagName("so")[0].firstChild.nodeValue)
print str(node.getElementsByTagName("kernel")[0].firstChild.nodeValue)
parse_file("./fakexml.xml")
|
And now, the same thing using xml.etree.ElementTree:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #!/usr/bin/env python
from xml.etree import ElementTree as ET
def parse_file(filename):
tree = ET.ElementTree(file=filename)
attrList = [ entry for entry in tree.findall('./node') ]
for e in attrList:
print e.find('attribute').text,
print e.find('res_ids').text,
print e.find('nEName').text,
print e.find('ipList').text,
print e.find('ipv6List').text,
print e.find('so').text,
print e.find('kernel').text
parse_file("./fakexml.xml")
|
The output from any of the two options would be the same one:
$> python print_table_from_xml_dom.py
Front End 100 BALDR 192.168.0.5 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
Web 12 THOR 192.168.0.20 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
Storage 200 VALI 192.168.0.10 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
DB 230 LOKI 192.168.0.110 fe80::1:f6f1:fe01:12 Fedora 3.0.93
Backup 300 OTHER 192.168.0.103 fe80::1:f6f1:fe01:12 Debian 3.0.93
$> python print_table_from_xml_element.py
Front End 100 BALDR 192.168.0.5 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
Web 12 THOR 192.168.0.20 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
Storage 200 VALI 192.168.0.10 fe80::1:f6f1:fe01:12 Ubuntu 3.0.93
DB 230 LOKI 192.168.0.110 fe80::1:f6f1:fe01:12 Fedora 3.0.93
Backup 300 OTHER 192.168.0.103 fe80::1:f6f1:fe01:12 Debian 3.0.93
As I mentioned before, for this small exercise, there is no difference in terms of time of execution between both alternatives. I created a "slightly bigger" xml file (100x bigger), and the times were still pretty much the same, so I would say it is not a matter of the size, but the complexity of the filters.
To wrap up, let's put this code together with the one in the previous post, where we created a table from the data in a dictionary.
I use ElementTree instead of xml.dom, but both of them would work. Here's the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #!/usr/bin/env python
import sys
from collections import defaultdict
from xml.etree import ElementTree as ET
if sys.argv[1:]:
inputF = sys.argv[1]
else:
print "ERROR. Execution failed: missing file"
sys.exit(1)
total = defaultdict(list)
# Column width
AttrColLen=25
ValueColLen=20
colwidth1="{0:<"+str(AttrColLen)+"}"
colwidth2="{0:<"+str(ValueColLen)+"}"
# Printer function
def print_table(data):
for key, values in sorted(total.items()):
print "|" + AttrColLen*"-" + ((ValueColLen+2)*len(values))*"-" + "-|"
print "| " + colwidth1.format(key) + "|",
for i in xrange(len(values)):
print colwidth2.format(values[i])+"|",
print ""
print "|" + AttrColLen*"-" + ((ValueColLen+2)*len(values))*"-" + "-|"
# Parse the xml, and add the items to the dictionary
def parse_file(filename):
colwidth1="{0:<30}"
colwidth2="{0:<25}"
tree = ET.ElementTree(file=filename)
attrList = [ entry for entry in tree.findall('./node') ]
for e in attrList:
total["Attribute"].append(e.find('attribute').text)
total["RES_ID"].append(e.find('res_ids').text)
total["networkElementName"].append(e.find('nEName').text)
total["IPv4"].append(e.find('ipList').text)
total["IPv6"].append(e.find('ipv6List').text)
total["SO"].append(e.find('so').text)
total["Kernel"].append(e.find('kernel').text)
parse_file(inputF)
print_table(total)
|
And here is the result:
$> python print_table_from_dict.py fakexml.xml
|----------------------------------------------------------------------------------------------------------------------------------------|
| Attribute | Front End | Web | Storage | DB | Backup |
|----------------------------------------------------------------------------------------------------------------------------------------|
| IPv4 | 192.168.0.5 | 192.168.0.20 | 192.168.0.10 | 192.168.0.110 | 192.168.0.103 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| IPv6 | fe80::1:f6f1:fe01:12| fe80::1:f6f1:fe01:12| fe80::1:f6f1:fe01:12| fe80::1:f6f1:fe01:12| fe80::1:f6f1:fe01:12|
|----------------------------------------------------------------------------------------------------------------------------------------|
| Kernel | 3.0.93 | 3.0.93 | 3.0.93 | 3.0.93 | 3.0.93 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| RES_ID | 100 | 12 | 200 | 230 | 300 |
|----------------------------------------------------------------------------------------------------------------------------------------|
| SO | Ubuntu | Ubuntu | Ubuntu | Fedora | Debian |
|----------------------------------------------------------------------------------------------------------------------------------------|
| networkElementName | BALDR | THOR | VALI | LOKI | OTHER |
|----------------------------------------------------------------------------------------------------------------------------------------|
;)