StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I had a feeling that maybe there was something weird happening due to running out of memory or something, so I rewrote the parser using an iterator over each drug and tried it out and got the program to complete without raising an exception.</p> <p>Basically what I'm doing here is, instead of loading the entire XML file into memory, I parse the XML file for the beginning and end of each <code><drug></code> and <code></drug></code> tag. Then I parse that with the minidom each time.</p> <p>The code might be a little fragile as I assume that each <code><drug></code> and <code></drug></code> pair are on their own lines. Hopefully it helps more than it harms though.</p> <pre><code>#!python import codecs from xml.dom import minidom class DrugBank(object): def __init__(self, filename): self.fp = open(filename, 'r') def __iter__(self): return self def next(self): state = 0 while True: line = self.fp.readline() if state == 0: if line.strip().startswith('<drug '): lines = [line] state = 1 continue if line.strip() == '</drugs>': self.fp.close() raise StopIteration() if state == 1: lines.append(line) if line.strip() == '</drug>': return minidom.parseString("".join(lines)) with codecs.open('csvout.csv', 'w', 'utf-8') as csvout, open('dtout.csv', 'w') as dtout: db = DrugBank('drugbank.xml') for dom in db: entry = dom.firstChild drugtype = entry.attributes['type'].value drugidObj = entry.getElementsByTagName('drugbank-id')[0] drugid = drugidObj.childNodes[0].nodeValue drugnameObj = entry.getElementsByTagName('name')[0] drugname = drugnameObj.childNodes[0].nodeValue targetlist = entry.getElementsByTagName('target') for target in targetlist: targetid = target.attributes['partner'].value dtout.write((','.join((drugid,targetid)))+'\n') csvout.write((','.join((drugid,drugname,drugtype)))+'\n') </code></pre> <p>An interesting read that might help you out further is here: <a href="http://www.ibm.com/developerworks/xml/library/x-hiperfparse/" rel="nofollow">http://www.ibm.com/developerworks/xml/library/x-hiperfparse/</a></p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload