Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to handle non-UTF8 html page in Java?
    primarykey
    data
    text
    <p>My task is to retrieve html strings from urls using Java.</p> <p>I know how to using HttpUrlConnection &amp; InputStream to get the string.</p> <p>However, I have an encoding problem for some pages.</p> <p>If some pages have different encoding (e.g., GB2312), other than UTF8, the string I get is just arbitrary chars or question marks.</p> <p>Can any one please tell me how to solve this problem?</p> <p>Thanks</p> <p>Below is my code to download the html from a url.</p> <pre><code>private String downloadHtml(String urlString) { URL url = null; InputStream inStr = null; StringBuffer buffer = new StringBuffer(); try { url = new URL(urlString); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); // Cast shouldn't fail HttpURLConnection.setFollowRedirects(true); // allow both GZip and Deflate (ZLib) encodings //conn.setRequestProperty("Accept-Encoding", "gzip, deflate"); String encoding = conn.getContentEncoding(); inStr = null; // create the appropriate stream wrapper based on // the encoding type if (encoding != null &amp;&amp; encoding.equalsIgnoreCase("gzip")) { inStr = new GZIPInputStream(conn.getInputStream()); } else if (encoding != null &amp;&amp; encoding.equalsIgnoreCase("deflate")) { inStr = new InflaterInputStream(conn.getInputStream(), new Inflater(true)); } else { inStr = conn.getInputStream(); } int ptr = 0; InputStreamReader inStrReader = new InputStreamReader(inStr, Charset.forName("GB2312")); while ((ptr = inStrReader.read()) != -1) { buffer.append((char)ptr); } inStrReader.close(); conn.disconnect(); } catch(Exception e) { e.printStackTrace(); } finally { if (inStr != null) try { inStr.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } return buffer.toString(); } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload