Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As repeated already many times, PHP files do not have any encoding for bytes above x7f. All you can tell is that the bytes x00 to x7f are ascii.</p> <p>A file with a BOM marker at the beginning is not valid PHP. So there is nothing like a PHP file in iso-8859-1 or utf-8. It is plain 8-bit.</p> <p>A PHP file is not iso-8859-x, because those encodings do not contain all possible byte values. As you know x7f to x9f are not valid in iso-8859-1, but any PHP file <em>can possibly</em> contain them.</p> <p>A PHP file is not utf-8 neither, because it might contain invalid utf-8 sequences, without being invalid.</p> <h2>The big picture</h2> <h3>Charset by convention at writing</h3> <p>A PHP file <em>can</em> have an encoding by convention, but this is up to the discretion of the programmer. He will tell his editor, that such project is in utf-8 or iso-8859-1 or what else.</p> <p>But again, this is only a convention of the programmer. His editor is threating the PHP file as if it were in such and such encoding. The encoding is merely serving the purpose of displaying the file in the editor and allows the programmer to edit it.</p> <h3>No charset during compilation</h3> <p>As explained above, the compiler does not need to know the encoding the programmer assumed. The only thing that matters is what are the <em>byte sequences</em> in the file.</p> <h3>Implicit or explicit charset defined on consumption</h3> <p>PHP generates some data that is sent over internet to the browser. At the time the browser displays the data, the encoding is definitely defined, but how ?</p> <ul> <li>The encoding can be defined in the HTTP header, like this <code>Content-Type: text/html; charset=utf-8</code></li> <li>It can be defined in the HTML output itself: <code>&lt;meta charset="utf-8"&gt;</code></li> <li>Or if the charset is not defined explicitely, the browser makes an educated guess depending on the byte sequences present in the document (e.g. valid utf-8 sequences or BOM).</li> </ul> <p>Of course it is good practice that an PHP application never lets the browser choose, but there is no requirement that the encoding be defined anywhere.</p> <h2>More details</h2> <p>Normally, the encoding the programmer chooses will be the same which will be used at the end of the chain in the browser, and all strings in the PHP-files will use this same encoding.</p> <p>But this needs not be the case. There are valid reasons, why this will not be the case. Let's look at examples:</p> <h3>Different languages, different encodings</h3> <p>I use Joomla since it's version 1.0. In this version, the language files had each their own encoding. The french language was iso-8859-1, while the arab files were windows-1256 and russian files koi8-r. For those encoding mattered, but not for all other files, which could be treated equally as utf-8 or iso-5598-1. (Meanwhile, Joomla switched to utf-8.)</p> <h3>Heterogeneous databases</h3> <p>One of our web application connects to two different databases, one happens to be in utf-8, the other one in windows-1252. This means, that all the strings in this project are not in the same encoding. I use utf-8 as much as possible, but I need to thanslate the encodings back and forth using the <code>mb_*</code>group of functions in PHP.</p> <h3>PHP's conversion functions</h3> <p>Merely the presence of the encoding conversion functions <code>mb_convert_encoding</code>, <code>iconv</code>, <code>utf8_encode</code>, etc. suggests that in the same project string of different encodings can be present.</p> <h2>Good practice</h2> <p>Define your encoding and stick to it ! The best choice will be the use of utf-8. If other strings of other encodings are needed, you can always write something like <code>$s=mb_convert_encoding('Уровень','ucs-2','utf8');</code></p> <p>Here again: <strong>You cannot use BOM markers in PHP</strong>. The reason is simple: A BOM marker ar two bytes that come before the opening tag <code>&lt;?php</code>. They are therefore sent to the browser. If one tries to send a <code>header()</code> afterwards, an error is generated, and the header is not sent.</p> <h2>Conclusion</h2> <ul> <li>In general, there is no need to determine the encoding of a PHP file. Only the encoding of the finally rendered HTML-file is important.</li> <li>It is good practice to edit all files in the same encoding that is used to display the final results. But it really only matters for the language files (if you use any system of i18n at all).</li> <li>While in practice all the strings in one file are in the same encoding, nothing would keep an ill minded programmer to write strings in different encodings in the same file, and still get a working program.</li> </ul> <p><strong>Finally encoding in PHP is only a matter of convention used at writing time, and the charset used in the browser to render the page. In between, a PHP file has no specific encoding, it's just plain 8-bit.</strong></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload