Note that there are some explanatory texts on larger screens.

plurals
  1. POOrganizing XML data into dictionaries
    primarykey
    data
    text
    <p>I'm trying to organize my data into a dictionary format from XML data. This will be used to run Monte Carlo simulations.</p> <p>Here is an example of what a couple of entries in the XML look like:</p> <pre><code>&lt;retirement&gt; &lt;item&gt; &lt;low&gt;-0.34&lt;/low&gt; &lt;high&gt;-0.32&lt;/high&gt; &lt;freq&gt;0.0294117647058824&lt;/freq&gt; &lt;variable&gt;stock&lt;/variable&gt; &lt;type&gt;historic&lt;/type&gt; &lt;/item&gt; &lt;item&gt; &lt;low&gt;-0.32&lt;/low&gt; &lt;high&gt;-0.29&lt;/high&gt; &lt;freq&gt;0&lt;/freq&gt; &lt;variable&gt;stock&lt;/variable&gt; &lt;type&gt;historic&lt;/type&gt; &lt;/item&gt; &lt;/retirement&gt; </code></pre> <p>My current data sets only have two variables and the type can be 1 of 3 or possible 4 discrete types. Hard coding two variables isn't a problem, but I would like to start working with data that has many more variables and automate this process. My goal is to automatically import this XML data into a dictionary to be able to further manipulate it later without having to hard code in the array titles and the variables.</p> <p>Here is what I have:</p> <pre><code># Import XML Parser import xml.etree.ElementTree as ET # Parse XML directly from the file path tree = ET.parse('xmlfile') # Create iterable item list Items = tree.findall('item') # Create Master Dictionary masterDictionary = {} # Assign variables to dictionary for Item in Items: thisKey = Item.find('variable').text if thisKey in masterDictionary == False: masterDictionary[thisKey] = [] else: pass thisList = masterDictionary[thisKey] newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text)) thisSublist.append(newDataPoint) </code></pre> <p>I'm getting a KeyError @ thisList = masterDictionary[thisKey]</p> <p>I am also trying to create a class to deal with some of the other elements of the xml:</p> <pre><code># Define a class for each data point that contains low, hi and freq attributes class DataPoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq </code></pre> <p>Would I then be able to check a value with something like:</p> <pre><code>masterDictionary['stock'] [0].freq </code></pre> <p>Any and all help is appreciated</p> <p><strong>UPDATE</strong></p> <p>Thanks for the help John. The indentation issues are sloppiness on my part. It's my first time posting on Stack and I just didn't get the copy/paste right. The part after the else: is in fact indented to be a part of the for loop and the class is indented with four spaces in my code--just a bad posting here. I'll keep the capitalization convention in mind. Your suggestion indeed worked and now with the commands:</p> <pre><code>print masterDictionary.keys() print masterDictionary['stock'][0].low </code></pre> <p>yields:</p> <pre><code>['inflation', 'stock'] -0.34 </code></pre> <p>those are indeed my two variables and the value syncs with the xml listed at the top.</p> <p><strong>UPDATE 2</strong></p> <p>Well, I thought I had figured this one out, but I was careless again and it turns out that I hadn't quite fixed the issue. The previous solution ended up writing all of the data to my two dictionary keys so that I have two equal lists of all the data assigned to two different dictionary keys. The idea is to have distinct sets of data assigned from the XML to the matching dictionary key. Here is the current code:</p> <pre><code># Import XML Parser import xml.etree.ElementTree as ET # Parse XML directly from the file path tree = ET.parse(xml file) # Create iterable item list items = tree.findall('item') # Create class for historic variables class DataPoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq # Create Master Dictionary and variable list for historic variables masterDictionary = {} thisList = [] # Loop to assign variables as dictionary keys and associate their values with them for item in items: thisKey = item.find('variable').text masterDictionary[thisKey] = thisList if thisKey not in masterDictionary: masterDictionary[thisKey] = [] newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) thisList.append(newDataPoint) </code></pre> <p>When I input:</p> <pre><code>print masterDictionary['stock'][5].low print masterDictionary['inflation'][5].low print len(masterDictionary['stock']) print len(masterDictionary['inflation']) </code></pre> <p>the results are identical for both keys ('stock' and 'inflation'):</p> <pre><code>-.22 -.22 56 56 </code></pre> <p>There are 27 items with the stock tag in the XML file and 29 tagged with inflation. How can I make each list assigned to a dictionary key only pull the particular data in the loop?</p> <p><strong>UPDATE 3</strong></p> <p>It seems to work with 2 loops, but I have no idea how and why it won't work in 1 single loop. I managed this accidentally:</p> <pre><code># Import XML Parser import xml.etree.ElementTree as ET # Parse XML directly from the file path tree = ET.parse(xml file) # Create iterable item list items = tree.findall('item') # Create class for historic variables class DataPoint: def __init__(self, low, high, freq): self.low = low self.high = high self.freq = freq # Create Master Dictionary and variable list for historic variables masterDictionary = {} # Loop to assign variables as dictionary keys and associate their values with them for item in items: thisKey = item.find('variable').text thisList = [] masterDictionary[thisKey] = thisList for item in items: thisKey = item.find('variable').text newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text)) masterDictionary[thisKey].append(newDataPoint) </code></pre> <p>I have tried a large number of permutations to make it happen in one single loop but no luck. I can get all of the data listed into both keys--identical arrays of all the data (not very helpful), or the data sorted correctly into 2 distinct arrays for both keys, but only the last single data entry (the loop overwrites itself each time leaving you with only one entry in the array).</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload