Get All Same Attribute Values For Xml In Python
Solution 1:
Assuming your posted XML is a copy/paste issue with missing closing of root element opening, your other main issue is the classic XML parsing issue which involves parsing nodes under a default namespace which includes any attribute starting with xmlns
without a colon separated prefix like xmlns:doc="..."
.
As a result, you need to define a temporary namespace prefix in Python to parse each named element which you can do with a dictionary passed into findall
(not find_all
).
from lxml import etree as ET
tree = ET.parse('0004.xml')
nsmp = {'doc': 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15'}
root = tree.getroot()
print(root.tag)
# SPECIFY NAMESPACE AND PREFIX ALL NAMED ELEMENTSfor tag in root.findall('doc:Page/doc:TextRegion/doc:Coords', namespaces=nsmp):
value = tag.get('points')
print(value)
# 1653,146 1651,148# 2071,326 2069,328 2058,328 2055# 2247,2825 2247,2857 2266,2857 2268,2860 2268# 731,2828 731,2839 728,2841
By the way, lxml
is a feature-rich XML library (that required 3rd party installation) that among other powerful tools supports full XPath 1.0. The above code can still work with Python's built-in etree
simply by changing import
line as from xml.etree import ElementTree as ET
.
However, lxml
extends this library such as parsing directly to attributes with xpath
:
tree = ET.parse('0004.xml')
# SPECIFY NAMESPACE AND PREFIX ALL NAMED ELEMENTSfor pts in tree.xpath('//doc:Coords/@points', namespaces=nsmp):
print(pts)
# 1653,146 1651,148# 2071,326 2069,328 2058,328 2055# 2247,2825 2247,2857 2266,2857 2268,2860 2268# 731,2828 731,2839 728,2841
Post a Comment for "Get All Same Attribute Values For Xml In Python"