Note that there are some explanatory texts on larger screens.

plurals
  1. PODisplaying XML using CSS: How to handle &nbsp?
    primarykey
    data
    text
    <p>I'm dealing with a lot of .xml files. (Millions - an .xml formatted dump of Wikipedia) and they're a lot more unreadable than I imagined. </p> <p>For the time being, I've written a .css file to display them in a readable manner in a browser, and wrote a script to plug a reference to this .css into all the files. </p> <p>(I know there's other solutions, like XSLT - but all the information I found made it seem document-level which didn't suit - I'm really trying not to expand the size of these files if possible)</p> <p>The .css works fine for <em>some</em> of the files, but many contain entities like <em>&amp;nbsp</em> and I get errors like: </p> <p>"XML Parsing Error: undefined entity" with a nice little illustration pointing to <em>&amp;nbsp</em> or it's kin within a quote.</p> <p>There is an articles.dtd file, which seems like it should connect the dots ( keyword -> Unicode ) for the browser. It is referenced in each file like: </p> <pre><code> &lt;!DOCTYPE article SYSTEM "../article.dtd"&gt; </code></pre> <p>and contains a lot of entries like: </p> <pre><code>&lt;!ENTITY nbsp "&amp;#160;"&gt; &lt;!-- no-break space = non-breaking space, U+00A0 ISOnum --&gt; </code></pre> <p>but either I'm entirely misunderstanding what this file is for, or it's not working correctly. </p> <p>In any case; How can I make these documents display; Either by: </p> <ul> <li> displaying the entities (like "&nbSp" as plain-text) </li> <li> removing the entities altogether (by any means other than just a linear search/removal of them in the actual files) </li> <li> Interpreting the entities as unicode, as they were intended</li> </ul> <p>Naturally, the latter being preferable; absolutely ideally, by referencing some sort of external file that maps identities to Unicode (if that's not what the articles.dtd file is for....) </p> <p>EDIT: I'm not working with a powerful machine here.. extracting the .rars took days. Any sort of edits to each file would take a very long time.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload