Parse Xml With Lxml - Extract Element Value
Let's suppose we have the XML file with the structure as follows.
Solution 1:
I would be more direct in your XPath: go straight for the elements you want, in this case datafield
.
>>> fordfin doc.xpath('//datafield'):
# Iterate over attributes of datafieldfor attrib_name in df.attrib:
print'@' + attrib_name + '=' + df.attrib[attrib_name]
# subfield is a child of datafield, and iterate
subfields = df.getchildren()
for subfield in subfields:
print'subfield=' + subfield.text
Also, lxml appears to let you ignore the namespace, maybe because your example only uses one namespace?
Solution 2:
I would just go with
fordfin doc.xpath('//datafield'):
print df.attrib
for sf in df.getchildren():
print sf.text
Also you don't need urllib, you can directly parse XML with HTTP
url = "http://dl.dropbox.com/u/540963/short_test.xml"#doesn't work with https thoughdoc = etree.parse(url)
Solution 3:
Try the following working code :
import urllib2
from lxml import etree
url = "https://dl.dropbox.com/u/540963/short_test.xml"
fp = urllib2.urlopen(url)
doc = etree.parse(fp)
fp.close()
for record in doc.xpath('//datafield'):
print record.xpath("./@tag")[0]
for x in record.xpath("./subfield/text()"):
print"\t", x
Post a Comment for "Parse Xml With Lxml - Extract Element Value"