Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to determine if a String contains invalid encoded characters
    primarykey
    data
    text
    <p><strong>Usage scenario</strong></p> <p>We have implemented a webservice that our web frontend developers use (via a php api) internally to display product data. On the website the user enters something (i.e. a query string). Internally the web site makes a call to the service via the api.</p> <p><strong>Note: We use restlet, not tomcat</strong> </p> <p><strong>Original Problem</strong></p> <p>Firefox 3.0.10 seems to respect the selected encoding in the browser and encode a url according to the selected encoding. This does result in different query strings for ISO-8859-1 and UTF-8.</p> <p>Our web site forwards the input from the user and does not convert it (which it should), so it may make a call to the service via the api calling a webservice using a query string that contains german umlauts.</p> <p>I.e. for a query part looking like </p> <pre><code> ...v=abcädef </code></pre> <p>if "ISO-8859-1" is selected, the sent query part looks like</p> <pre><code>...v=abc%E4def </code></pre> <p>but if "UTF-8" is selected, the sent query part looks like</p> <pre><code>...v=abc%C3%A4def </code></pre> <p><strong>Desired Solution</strong></p> <p>As we control the service, because we've implemented it, we want to check on <strong>server side</strong> wether the call contains non utf-8 characters, if so, respond with an 4xx http status</p> <p><strong>Current Solution In Detail</strong></p> <p>Check for each character ( == string.substring(i,i+1) )</p> <ol> <li>if character.getBytes()[0] equals 63 for '?'</li> <li>if Character.getType(character.charAt(0)) returns OTHER_SYMBOL</li> </ol> <p><strong>Code</strong></p> <pre><code>protected List&lt; String &gt; getNonUnicodeCharacters( String s ) { final List&lt; String &gt; result = new ArrayList&lt; String &gt;(); for ( int i = 0 , n = s.length() ; i &lt; n ; i++ ) { final String character = s.substring( i , i + 1 ); final boolean isOtherSymbol = ( int ) Character.OTHER_SYMBOL == Character.getType( character.charAt( 0 ) ); final boolean isNonUnicode = isOtherSymbol &amp;&amp; character.getBytes()[ 0 ] == ( byte ) 63; if ( isNonUnicode ) result.add( character ); } return result; } </code></pre> <p><strong>Question</strong></p> <p>Will this catch all invalid (non utf encoded) characters? Does any of you have a better (easier) solution?</p> <p><strong>Note:</strong> I checked URLDecoder with the following code</p> <pre><code>final String[] test = new String[]{ "v=abc%E4def", "v=abc%C3%A4def" }; for ( int i = 0 , n = test.length ; i &lt; n ; i++ ) { System.out.println( java.net.URLDecoder.decode(test[i],"UTF-8") ); System.out.println( java.net.URLDecoder.decode(test[i],"ISO-8859-1") ); } </code></pre> <p>This prints:</p> <pre><code>v=abc?def v=abcädef v=abcädef v=abcädef </code></pre> <p>and it does <strong>not</strong> throw an IllegalArgumentException <em>sigh</em></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload