Note that there are some explanatory texts on larger screens.

plurals
  1. POJava - Html special chars
    primarykey
    data
    text
    <p>I want to make some XPath request on a HTML file. Here is my code:</p> <pre><code>public static void main(String args[]) { try{ /** We load the HTML file we want to parse */ BufferedReader br = new BufferedReader(new InputStreamReader (new FileInputStream("html_doyoubuzz.html"),"UTF-8")); /** we clean HTML file */ TagNode tagNode = new HtmlCleaner().clean(br); Document doc2 = new DomSerializer( new CleanerProperties() ).createDOM(tagNode); /****************************** * * * XPath Requests * * * ******************************/ XPath xpath = XPathFactory.newInstance().newXPath(); Object dates_experience = xpath.evaluate("/html/body/div[3]/div/div/div[2]/div/div/div[2]/div[4]/div/div[3]/h4/span[2]", doc2, XPathConstants.NODESET); NodeList nodes = (NodeList) dates_experience; String s; for (int i = 0; i &lt; nodes.getLength(); i++) { s = org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(nodes.item(i).getTextContent()); System.out.println(s); } } catch (Exception e){//Catch exception if any e.printStackTrace(); } } </code></pre> <p>My HTML file is encoded in UTF-8 (as written in the meta tag). My problem is the output. I get this:</p> <pre><code>d?cembre 2010 - d?cembre 2010) f?vrier 2010 - juin 2010) juillet 2009 - septembre 2009) juin 2009 - juin 2009) juillet 2008 - ao?t 2008) </code></pre> <p>instead of this, which is my desired output:</p> <pre><code>décembre 2010 - décembre 2010) février 2010 - juin 2010) juillet 2009 - septembre 2009) juin 2009 - juin 2009) juillet 2008 - août 2008) </code></pre> <p>Do you have any idea to solve the problem ?</p> <p>Thanks.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload