Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are many tools to do this. Try a web search for "detect encoding". Here are some of the tools I found:</p> <ul> <li><p>The <strong>Internationalizations Classes for Unicode</strong> (ICU) are a great place to start. See especially their page on <a href="http://userguide.icu-project.org/conversion/detection" rel="nofollow noreferrer">Character Set Detection</a>. </p></li> <li><p><strong>Chardet</strong> is a Python module to guess the encoding of a file. See chardet.feedparser.org</p></li> <li><p>The *nix command-line tool <strong>file</strong> detects file types, but might also detect encodings if mentioned in the file (e.g. if there's a mime-type notation in the file). See <code>man file</code> </p></li> <li><p>Perl modules <strong>Encode::Detect</strong> and <strong>Encode::Guess</strong> . </p></li> <li><p>Someone asked a similar question in StackOverflow. Search for the question, <strong>PHP: Detect encoding and make everything UTF-8</strong>. That's in the context of fetching files from the net and using PHP, but you could write a command-line PHP script.</p></li> </ul> <p>Note well what the ICU page says about character set detection: "Character set detection is ..., at best, an imprecise operation using statistics and heuristics...." In my experience the problem domain makes a big difference in how easy or difficult the job is. Don't forget that it's possible that the octets in a file can be of ambiguous encoding, <em>i.e.</em> sensibly interpreted using multiple different encodings. They can also be of mixed encoding, <em>i.e.</em> different subsets of the octets make sense interpreted in different encodings. This is why there's not a single command-line tool I can recommend which always does the job.</p> <p>If you have a single file and you just want to get it into a known encoding, my trick is to open the file with a text editor which can import using a bunch of different encodings, such as TextWrangler or OpenOffice.org. First, open the file and let the editor guess the encoding. Take a look at the result. If you aren't satisfied with it, guess an encoding, open the file with the editor specifying that encoding, and take a look at the result. Then save as a known encoding, e.g. UTF-16.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload