Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I open files containing accents in Java?
    primarykey
    data
    text
    <p>(<em>editing for clarification and adding some code</em>)</p> <p>Hello, We have a requirement to parse data sent from users all over the world. Our Linux systems have a default locale of en_US.UTF-8. However, we often receive files with diacritical marks in their names such as "<code>special_á_ã_è_characters.doc</code>". Though the OS can deal with these files fine, and an strace shows the OS passing the correct file name to the Java program, Java munges the names and throws a "file not found" io exception trying to open them.</p> <p>This simple program can illustrate the issue:</p> <pre><code>import java.io.*; import java.text.*; public class load_i18n { public static void main( String [] args ) { File actual = new File("."); for( File f : actual.listFiles()){ System.out.println( f.getName() ); } } } </code></pre> <p>Running this program in a directory containing the file <code>special_á_ã_è_characters.doc</code> and the default US English locale gives:</p> <p>special_�_�_�_characters.doc</p> <p>Setting the language via export LANG=es_ES@UTF-8 prints out the filename correctly (but is an unacceptable solution since the entire system is now running in Spanish.) Explicitly setting the Locale inside the program like the following has no effect either. Below I've modified the program to a) attempt to open the file and b) print out the name in both ASCII and as a byte array when it fails to open the file:</p> <pre><code>import java.io.*; import java.util.Locale; import java.text.*; public class load_i18n { public static void main( String [] args ) { // Stream to read file FileInputStream fin; Locale locale = new Locale("es", "ES"); Locale.setDefault(locale); File actual = new File("."); System.out.println(Locale.getDefault()); for( File f : actual.listFiles()){ try { fin = new FileInputStream (f.getName()); } catch (IOException e){ System.err.println ("Can't open the file " + f.getName() + ". Printing as byte array."); byte[] textArray = f.getName().getBytes(); for(byte b: textArray){ System.err.print(b + " "); } System.err.println(); System.exit(-1); } System.out.println( f.getName() ); } } } </code></pre> <p>This produces the output</p> <pre><code>es_ES load_i18n.class Can't open the file special_�_�_�_characters.doc. Printing as byte array. 115 112 101 99 105 97 108 95 -17 -65 -67 95 -17 -65 -67 95 -17 -65 -67 95 99 104 97 114 97 99 116 101 114 115 46 100 111 99 </code></pre> <p>This shows that the issue is NOT just an issue with console display as the same characters and their representations are output in byte or ASCII format. In fact, console display does work even when using LANG=en_US.UTF-8 for some utilities like bash's echo:</p> <pre><code>[mjuric@arrhchadm30 tmp]$ echo $LANG en_US.UTF-8 [mjuric@arrhchadm30 tmp]$ echo * load_i18n.class special_á_ã_è_characters.doc [mjuric@arrhchadm30 tmp]$ ls load_i18n.class special_?_?_?_characters.doc [mjuric@arrhchadm30 tmp]$ </code></pre> <p>Is it possible to modify this code in such a way that when run under Linux with LANG=en_US.UTF-8, it reads the file name in such a way that it can be successfully opened?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload