Note that there are some explanatory texts on larger screens.

plurals
  1. POElement Tree (Python) itertext not working with line breaks
    primarykey
    data
    text
    <p>As a followup question to one I recently posted ...</p> <p>I am doing some XML parsing with ElementTree, and I have the following method in Python:</p> <pre><code>def extract_all_text(element): "".join(element.itertext()) </code></pre> <p>The purpose of this is to extract the text from an element, stripping any tags wrapping any text in the element. ėg., <code>extract_all_text(ElementTree.fromstring('&lt;a&gt;B &lt;c&gt;D&lt;/c&gt;&lt;/a&gt;'))</code> should return <code>B D</code>. However, I'm getting a strange error trying to use this method with elements from files containing line breaks. The error looks like this:</p> <pre><code>File "/home/Intredasting/foo.py", line 74, in bar description = extract_all_text(root.find('description')).strip() File "/home/Intredasting/foo.py", line 62, in extract_all_text return "".join(element.itertext()) TypeError: sequence item 0: expected str instance, list found </code></pre> <p>If I run <code>ElementTree.dump(root.find('description'))</code>, which shows the XML element that I am trying to parse, I get this:</p> <pre><code>&lt;description&gt; Foo &lt;a href="http://example.com"&gt;bar&lt;/a&gt;. &lt;/description&gt; </code></pre> <p>If I remove the line breaks by editing the file so that the element looks like this:</p> <pre><code>&lt;description&gt;Foo &lt;a href="http://example.com"&gt;bar&lt;/a&gt;.&lt;/description&gt; </code></pre> <p>then the method works perfectly and I get <code>Foo bar.</code>. Why does this happen? How can I get the method to work with line breaks?</p> <p><strong>EDIT</strong>:</p> <p>You can see the specific file I am using here (I whittled it down to a simple version, but it still causes the error): <a href="http://www.filedropper.com/example_1" rel="nofollow">http://www.filedropper.com/example_1</a></p> <p>To test this file, try</p> <pre><code>$ python3 &gt;&gt;&gt; import xml.etree.ElementTree as ET &gt;&gt;&gt; tree = ET.parse('/path/to/example.xml') &gt;&gt;&gt; desc = tree.getroot().find('description') &gt;&gt;&gt; print("".join(desc.itertext())) </code></pre> <p>(This should yield the error.)</p> <p><strong>ANOTHER EDIT</strong>:</p> <p>This code provides additional insight into what is happening (run this in addition to the above code)</p> <pre><code>&gt;&gt;&gt; for text in desc.itertext(): print(text) ['\n', ' Foo '] bar ['.', '\n', ' '] </code></pre> <p>Of course, I can get around this issue by simply joining those lists together into a string. But I feel like this is either a bug with ElementTree, or something's funky with the input file, or my version of Python is screwed up.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload