Convert Nested XML Content Into CSV Using Xml Tree In Python
Solution 1:
I had a problem understanding what this code is supposed to do because it uses abstract variable names like item
, child
, subchild
and this makes it hard to reason about the code. I'm not as clever as that, so I renamed the variables to row
, tag
, and column
to make it easier for me to see what the code is doing. (In my book, even row and column are a bit abstract, but I suppose the opacity of the XML input is hardly your fault.)
You have 2 rows but you want 5 dictionaries, because you have 5 <column>
tags and you want each <column>
's data in a separate dictionary. But you want the other tags in the <row>
to be repeated along with each <column>
's data.
That means you need to build a dictionary for every <row>
, then, for each <column>
, add that column's data to the dictionary, then output it before going on to the next column.
This code makes the simplifying assumption that all of your <columns>
s have the same structure, with exactly one <question>
and exactly one <answer>
and nothing else. If this assumption does not hold then a <column>
may get reported with stale data it inherited from the previous <column>
in the same row. It will also produce no output at all for any <row>
that does not have at least one <column>
.
The code has to loop through the tags twice, once for the non-<column>
s and once for the <column>
s. Otherwise it can't be sure it has seen all the non-<column>
tags before it starts outputting the <column>
s.
There are other (no doubt more elegant) ways to do this, but I kept the code structure as close to your original as I could, other than making the variable names less opaque.
for row in root.find('./data'): # find all projects node
data = {} # dictionary to store content of each projects
for tag in row:
if tag.tag != "column":
data[tag.tag] = tag.text # add row to dictionary
# Now the dictionary data is built for the row level
for tag in row:
if tag.tag == "column":
for column in tag:
data[column.tag] = column.text
# Now we have added the column level data for one column tag
data_list.append(data.copy())
Output is as below. The key order of the dicts isn't preserved because I used pprint.pprint
for convenience.
[{'answer': 'a1',
'product': '1',
'question': 'Q1',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a2',
'product': '1',
'question': 'Q2',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a3',
'product': '1',
'question': 'Q3',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a4',
'product': '1',
'question': 'Q4',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a5',
'product': '1',
'question': 'Q5',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'}]
Post a Comment for "Convert Nested XML Content Into CSV Using Xml Tree In Python"