Skip to content Skip to sidebar Skip to footer

Convert Nested XML Content Into CSV Using Xml Tree In Python

I'm very new to python and please treat me as same. When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing

Solution 1:

I had a problem understanding what this code is supposed to do because it uses abstract variable names like item, child, subchild and this makes it hard to reason about the code. I'm not as clever as that, so I renamed the variables to row, tag, and column to make it easier for me to see what the code is doing. (In my book, even row and column are a bit abstract, but I suppose the opacity of the XML input is hardly your fault.)

You have 2 rows but you want 5 dictionaries, because you have 5 <column> tags and you want each <column>'s data in a separate dictionary. But you want the other tags in the <row> to be repeated along with each <column>'s data.

That means you need to build a dictionary for every <row>, then, for each <column>, add that column's data to the dictionary, then output it before going on to the next column.

This code makes the simplifying assumption that all of your <columns>s have the same structure, with exactly one <question> and exactly one <answer> and nothing else. If this assumption does not hold then a <column> may get reported with stale data it inherited from the previous <column> in the same row. It will also produce no output at all for any <row> that does not have at least one <column>.

The code has to loop through the tags twice, once for the non-<column>s and once for the <column>s. Otherwise it can't be sure it has seen all the non-<column> tags before it starts outputting the <column>s.

There are other (no doubt more elegant) ways to do this, but I kept the code structure as close to your original as I could, other than making the variable names less opaque.

for row in root.find('./data'):    # find all projects node
    data = {}              # dictionary to store content of each projects
    for tag in row:
        if tag.tag != "column":
            data[tag.tag] = tag.text   # add row to dictionary
    # Now the dictionary data is built for the row level
    for tag in row:
        if tag.tag == "column":
            for column in tag:
                data[column.tag] = column.text
            # Now we have added the column level data for one column tag
            data_list.append(data.copy())

Output is as below. The key order of the dicts isn't preserved because I used pprint.pprint for convenience.

[{'answer': 'a1',
  'product': '1',
  'question': 'Q1',
  'replica': '1',
  'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
  'seqnr': '1',
  'session': '1',
  'timestamp': '10-06-16 11:30'},
 {'answer': 'a2',
  'product': '1',
  'question': 'Q2',
  'replica': '1',
  'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
  'seqnr': '1',
  'session': '1',
  'timestamp': '10-06-16 11:30'},
 {'answer': 'a3',
  'product': '1',
  'question': 'Q3',
  'replica': '1',
  'respondent': 'w42h3fot34m7s6x',
  'seqnr': '1',
  'session': '1',
  'timestamp': '10-06-16 11:30'},
 {'answer': 'a4',
  'product': '1',
  'question': 'Q4',
  'replica': '1',
  'respondent': 'w42h3fot34m7s6x',
  'seqnr': '1',
  'session': '1',
  'timestamp': '10-06-16 11:30'},
 {'answer': 'a5',
  'product': '1',
  'question': 'Q5',
  'replica': '1',
  'respondent': 'w42h3fot34m7s6x',
  'seqnr': '1',
  'session': '1',
  'timestamp': '10-06-16 11:30'}]

Post a Comment for "Convert Nested XML Content Into CSV Using Xml Tree In Python"