Skip to content Skip to sidebar Skip to footer

Parse Xml With Lxml - Extract Element Value

Let's suppose we have the XML file with the structure as follows.

Solution 1:

I would be more direct in your XPath: go straight for the elements you want, in this case datafield.

>>> fordfin doc.xpath('//datafield'):
        # Iterate over attributes of datafieldfor attrib_name in df.attrib:
                print'@' + attrib_name + '=' + df.attrib[attrib_name]

        # subfield is a child of datafield, and iterate
        subfields = df.getchildren()
        for subfield in subfields:
                print'subfield=' + subfield.text

Also, lxml appears to let you ignore the namespace, maybe because your example only uses one namespace?

Solution 2:

I would just go with

fordfin doc.xpath('//datafield'):
    print df.attrib
    for sf in df.getchildren():
        print sf.text

Also you don't need urllib, you can directly parse XML with HTTP

url = "http://dl.dropbox.com/u/540963/short_test.xml"#doesn't work with https thoughdoc = etree.parse(url)

Solution 3:

Try the following working code :

import urllib2
from lxml import etree

url = "https://dl.dropbox.com/u/540963/short_test.xml"
fp = urllib2.urlopen(url)
doc = etree.parse(fp)
fp.close()

for record in doc.xpath('//datafield'):
    print record.xpath("./@tag")[0]
    for x in record.xpath("./subfield/text()"):
        print"\t", x

Post a Comment for "Parse Xml With Lxml - Extract Element Value"