StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POProgramatically clean/ignore namespaces in XML - python
text
Body
copied!<p>I'm trying to write a simple program to read my financial XML files from GNUCash, and learn Python in the process.</p> <p>The XML looks like this:</p> <pre><code><?xml version="1.0" encoding="utf-8" ?> <gnc-v2 xmlns:gnc="http://www.gnucash.org/XML/gnc" xmlns:act="http://www.gnucash.org/XML/act" xmlns:book="http://www.gnucash.org/XML/book" {...} xmlns:vendor="http://www.gnucash.org/XML/vendor"> <gnc:count-data cd:type="book">1</gnc:count-data> <gnc:book version="2.0.0"> <book:id type="guid">91314601aa6afd17727c44657419974a</book:id> <gnc:count-data cd:type="account">80</gnc:count-data> <gnc:count-data cd:type="transaction">826</gnc:count-data> <gnc:count-data cd:type="budget">1</gnc:count-data> <gnc:commodity version="2.0.0"> <cmdty:space>ISO4217</cmdty:space> <cmdty:id>BRL</cmdty:id> <cmdty:get_quotes/> <cmdty:quote_source>currency</cmdty:quote_source> <cmdty:quote_tz/> </gnc:commodity> </code></pre> <p>Right now, i'm able to iterate and get results using </p> <pre><code>import xml.etree.ElementTree as ET r = ET.parse("file.xml").findall('.//') </code></pre> <p>after manually cleaning the namespaces, but I'm looking for a solution that could either read the entries regardless of their namespaces OR remove the namespaces before parsing.</p> <p>Note that I'm a complete noob in python, and I've read: <a href="https://stackoverflow.com/questions/3378409/python-and-gnucash-extract-data-from-gnucash-files">Python and GnuCash: Extract data from GnuCash files</a>, <a href="https://stackoverflow.com/questions/2545783/cleaning-an-xml-file-in-python-before-parsing">Cleaning an XML file in Python before parsing</a> and <a href="https://stackoverflow.com/questions/1703882/python-xml-etree-elementtree-removing-namespaces">python: xml.etree.ElementTree, removing "namespaces"</a> along with ElementTree docs and I'm still lost...</p> <p>I've come up with this solution:</p> <pre><code>def strip_namespaces(self, tree): nspOpen = re.compile("<\w*:", re.IGNORECASE) nspClose = re.compile("<\/\w*:", re.IGNORECASE) for i in tree: start = re.sub(nspOpen, '<', tree.tag) end = re.sub(nspOpen, '<\/', tree.tag) # pprint(finaltree) return </code></pre> <p>But i'm failing to apply it. I can't seem to be able to retrieve the tag names as they appear on the file.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload