Note that there are some explanatory texts on larger screens.

plurals
  1. POProgramatically clean/ignore namespaces in XML - python
    text
    copied!<p>I'm trying to write a simple program to read my financial XML files from GNUCash, and learn Python in the process.</p> <p>The XML looks like this:</p> <pre><code>&lt;?xml version="1.0" encoding="utf-8" ?&gt; &lt;gnc-v2 xmlns:gnc="http://www.gnucash.org/XML/gnc" xmlns:act="http://www.gnucash.org/XML/act" xmlns:book="http://www.gnucash.org/XML/book" {...} xmlns:vendor="http://www.gnucash.org/XML/vendor"&gt; &lt;gnc:count-data cd:type="book"&gt;1&lt;/gnc:count-data&gt; &lt;gnc:book version="2.0.0"&gt; &lt;book:id type="guid"&gt;91314601aa6afd17727c44657419974a&lt;/book:id&gt; &lt;gnc:count-data cd:type="account"&gt;80&lt;/gnc:count-data&gt; &lt;gnc:count-data cd:type="transaction"&gt;826&lt;/gnc:count-data&gt; &lt;gnc:count-data cd:type="budget"&gt;1&lt;/gnc:count-data&gt; &lt;gnc:commodity version="2.0.0"&gt; &lt;cmdty:space&gt;ISO4217&lt;/cmdty:space&gt; &lt;cmdty:id&gt;BRL&lt;/cmdty:id&gt; &lt;cmdty:get_quotes/&gt; &lt;cmdty:quote_source&gt;currency&lt;/cmdty:quote_source&gt; &lt;cmdty:quote_tz/&gt; &lt;/gnc:commodity&gt; </code></pre> <p>Right now, i'm able to iterate and get results using </p> <pre><code>import xml.etree.ElementTree as ET r = ET.parse("file.xml").findall('.//') </code></pre> <p>after manually cleaning the namespaces, but I'm looking for a solution that could either read the entries regardless of their namespaces OR remove the namespaces before parsing.</p> <p>Note that I'm a complete noob in python, and I've read: <a href="https://stackoverflow.com/questions/3378409/python-and-gnucash-extract-data-from-gnucash-files">Python and GnuCash: Extract data from GnuCash files</a>, <a href="https://stackoverflow.com/questions/2545783/cleaning-an-xml-file-in-python-before-parsing">Cleaning an XML file in Python before parsing</a> and <a href="https://stackoverflow.com/questions/1703882/python-xml-etree-elementtree-removing-namespaces">python: xml.etree.ElementTree, removing &quot;namespaces&quot;</a> along with ElementTree docs and I'm still lost...</p> <p>I've come up with this solution:</p> <pre><code>def strip_namespaces(self, tree): nspOpen = re.compile("&lt;\w*:", re.IGNORECASE) nspClose = re.compile("&lt;\/\w*:", re.IGNORECASE) for i in tree: start = re.sub(nspOpen, '&lt;', tree.tag) end = re.sub(nspOpen, '&lt;\/', tree.tag) # pprint(finaltree) return </code></pre> <p>But i'm failing to apply it. I can't seem to be able to retrieve the tag names as they appear on the file.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload