Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You could use a modified version of the code below that catches the start and end of entities. It takes a few seconds to execute since the parser has to fetch the declarations of all HTML latin1 entities. When you get an entity that does not start with <code>%</code>, you could do the replacement of the inserted char in your <code>acc</code> buffer. Pay attention to predefined entities like <code>&amp;amp;</code>.</p> <p>You could also use a Sax filter to the job automatically. Cfr. answer <a href="https://stackoverflow.com/a/5524862/452614">https://stackoverflow.com/a/5524862/452614</a>. I might update my answer to provide a complete solution.</p> <pre><code>import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.io.UnsupportedEncodingException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.*; import org.xml.sax.ext.DefaultHandler2; class MyHandler extends DefaultHandler2 { private StringBuilder acc; public MyHandler() { acc = new StringBuilder(); } @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { System.out.printf("startElement. uri:%s, localName:%s, qName:%s\n", uri, localName, qName); acc.setLength(0); } @Override public void endElement(String uri, String localName, String qName) throws SAXException { System.out.printf("endElement. uri:%s, localName:%s, qName:%s\n", uri, localName, qName); System.out.printf("Characters accumulated: %s\n", acc.toString()); acc.setLength(0); } @Override public void characters(char[] ch, int start, int length) throws SAXException { acc.append(ch, start, length); System.out.printf("characters. [%s]\n", new String(ch, start, length)); } @Override public void startEntity(java.lang.String name) throws SAXException { System.out.printf("startEntity: %s\n", name); } @Override public void endEntity(java.lang.String name) throws SAXException { System.out.printf("endEntity: %s\n", name); } } public class SAXTest1 { public static void main(String args[]) throws SAXException, ParserConfigurationException, UnsupportedEncodingException { String s = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n&lt;!DOCTYPE author [\n&lt; !ELEMENT author (#PCDATA)&gt;\n&lt;!ENTITY % HTMLlat1 PUBLIC \"-//W3C//ENTITIES Latin 1 for XHTML//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent\"&gt; \n%HTMLlat1;\n]&gt;\n&lt;author&gt;G&amp;uuml;nther Heinemann&lt;/author&gt;"; System.out.println(s); InputStream stream = new ByteArrayInputStream(s.getBytes("UTF-8")); SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); XMLReader xmlReader = factory.newSAXParser().getXMLReader(); DefaultHandler2 handler = new MyHandler(); xmlReader.setContentHandler(handler); xmlReader.setProperty( "http://xml.org/sax/properties/lexical-handler", handler); try { xmlReader.parse(new InputSource(stream)); } catch (IOException e) { System.err.println("I/O error: " + e.getMessage()); } catch (SAXException e) { System.err.println("Parsing error: " + e.getMessage()); } } } </code></pre> <p>Program execution :</p> <pre><code>$ java SAXTest1 &lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;!DOCTYPE author [ &lt;!ELEMENT author (#PCDATA)&gt; &lt;!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3. org/TR/xhtml1/DTD/xhtml-lat1.ent"&gt; %HTMLlat1; ]&gt; &lt;author&gt;G&amp;uuml;nther Heinemann&lt;/author&gt; startEntity: %HTMLlat1 endEntity: %HTMLlat1 startElement. uri:, localName:, qName:author characters. [G] startEntity: uuml endEntity: uuml characters. [ünther Heinemann] endElement. uri:, localName:, qName:author Characters accumulated: Günther Heinemann </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload