Note that there are some explanatory texts on larger screens.

plurals
  1. PONon-english characters are decoded incorrectly on Android with HtlmCleaner
    primarykey
    data
    text
    <p>I'm using <code>HtmlCleaner</code> to scrape a <code>ISO-8859-1</code> encoded web site in Android.</p> <p>I've implemented this in an external <code>jar</code> file that I import into my Android app.</p> <p>When I run the unit tests in Eclipse it handles Norwegian letters (<code>æ,ø,å</code>) correct (I can verify that in the debugger), but in the Android app these characters look like inverted question marks.</p> <p>If I attach the debugger to my Android app I can see that these letters are not correct in the exact same places they were good when running unit test from Eclipse, so it's not a display/render/view issue in the Android app.</p> <p>When I copy the text from the debuggers I get these results:</p> <p><strong>Java Process (Unit Test)</strong>: &laquo;Blårek&raquo;, &laquo;Benny&raquo;</p> <p><strong>Android Process (In emulator)</strong>: &laquo;Bl�rek&raquo;, &laquo;Benny&raquo;</p> <p>I would expect these Strings to be equal, but notice how the "å" is replaed by the inverted question marks in Android. </p> <p>I have tried running <code>htmlCleaner.getProperties().setRecognizeUnicodeChars(true)</code> without any luck. Also, I found no way of forcing UTF-8 or ISO-8859-1 encoding in html cleaner, but I' not sure if that would have made a difference.</p> <p>Here is the code i run:</p> <pre><code>HtmlCleaner htmlCleaner = new HtmlCleaner(); // connect to url and get root TagNode from HtmlCleaner InputSteram is = new URL( url ).openConnection().getInputStream(); TagNode rootNode = htmlCleaner.clean( is ); // navigate through some TagNodes, getting the ContentNode ContentNode cn = rootNode... // This String contains the incorrectly decoded characters on Android. // Good in Oracle JVM though.. String value = cn.toString().trim(); </code></pre> <p>Does anyone knows what could cause the decoding behavoir to be different on Android? I guess the main difference between the two environments is that the Android app uses Android's java.io stack while my unit tests use Sun/Oracle's stack.</p> <p>Thanks,<br> Geir</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload