Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I asked the same question,</p> <p><a href="https://stackoverflow.com/questions/1233076/handling-character-encoding-in-uri-on-tomcat">Handling Character Encoding in URI on Tomcat</a></p> <p>I recently found a solution and it works pretty well for me. You might want give it a try. Here is what you need to do,</p> <ol> <li>Leave your URI encoding as Latin-1. On Tomcat, add URIEncoding="ISO-8859-1" to the Connector in server.xml.</li> <li>If you have to manually URL decode, use Latin1 as charset also.</li> <li>Use the fixEncoding() function to fix up encodings.</li> </ol> <p>For example, to get a parameter from query string,</p> <pre><code> String name = fixEncoding(request.getParameter("name")); </code></pre> <p>You can do this always. String with correct encoding is not changed.</p> <p>The code is attached. Good luck!</p> <pre><code> public static String fixEncoding(String latin1) { try { byte[] bytes = latin1.getBytes("ISO-8859-1"); if (!validUTF8(bytes)) return latin1; return new String(bytes, "UTF-8"); } catch (UnsupportedEncodingException e) { // Impossible, throw unchecked throw new IllegalStateException("No Latin1 or UTF-8: " + e.getMessage()); } } public static boolean validUTF8(byte[] input) { int i = 0; // Check for BOM if (input.length &gt;= 3 &amp;&amp; (input[0] &amp; 0xFF) == 0xEF &amp;&amp; (input[1] &amp; 0xFF) == 0xBB &amp; (input[2] &amp; 0xFF) == 0xBF) { i = 3; } int end; for (int j = input.length; i &lt; j; ++i) { int octet = input[i]; if ((octet &amp; 0x80) == 0) { continue; // ASCII } // Check for UTF-8 leading byte if ((octet &amp; 0xE0) == 0xC0) { end = i + 1; } else if ((octet &amp; 0xF0) == 0xE0) { end = i + 2; } else if ((octet &amp; 0xF8) == 0xF0) { end = i + 3; } else { // Java only supports BMP so 3 is max return false; } while (i &lt; end) { i++; octet = input[i]; if ((octet &amp; 0xC0) != 0x80) { // Not a valid trailing byte return false; } } } return true; } </code></pre> <p>EDIT: Your approach doesn't work for various reasons. When there are encoding errors, you can't count on what you are getting from Tomcat. Sometimes you get � or ?. Other times, you wouldn't get anything, getParameter() returns null. Say you can check for "?", what happens your query string contains valid "?" ?</p> <p>Besides, you shouldn't reject any request. This is not your user's fault. As I mentioned in my original question, browser may encode URL in either UTF-8 or Latin-1. User has no control. You need to accept both. Changing your servlet to Latin-1 will preserve all the characters, even if they are wrong, to give us a chance to fix it up or to throw it away.</p> <p>The solution I posted here is not perfect but it's the best one we found so far. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload