Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As its name may suggest, <code>strip_tags</code> should remove all HTML tags. The only way we can proof it is by analyzing the source code. The next analysis applies to a <code>strip_tags('...')</code> call, without a second argument for whitelisted tags.</p> <p>First at all, some theory about HTML tags: a tag starts with a <code>&lt;</code> followed by non-whitespace characters. If this string starts with a <code>?</code>, it <a href="http://www.w3.org/TR/html4/conform.html#h-4.2" rel="noreferrer">should not be parsed</a>. If this string starts with a <code>!--</code>, it's considered a comment and the following text should neither be parsed. A comment is terminated with a <code>--&gt;</code>, inside such a comment, characters like <code>&lt;</code> and <code>&gt;</code> are allowed. Attributes can occur in tags, their values may optionally be surrounded by a quote character (<code>'</code> or <code>"</code>). If such a quote exist, it must be closed, otherwise if a <code>&gt;</code> is encountered, the tag is not closed.</p> <p>The code <code>&lt;a href="example&gt;xxx&lt;/a&gt;&lt;a href="second"&gt;text&lt;/a&gt;</code> is interpreted in Firefox as:</p> <pre><code>&lt;a href="http://example.com%3Exxx%3C/a%3E%3Ca%20href=" second"=""&gt;text&lt;/a&gt; </code></pre> <p>The PHP function <a href="http://php.net/strip-tags" rel="noreferrer"><code>strip_tags</code></a> is referenced in <a href="http://lxr.php.net/opengrok/xref/PHP_5_3/ext/standard/string.c#4036" rel="noreferrer">line 4036 of ext/standard/string.c</a>. That function calls the <a href="http://lxr.php.net/opengrok/xref/PHP_5_3/ext/standard/string.c#php_strip_tags_ex" rel="noreferrer">internal function php_strip_tags_ex</a>.</p> <p>Two buffers exist, one for the output, the other for "inside HTML tags". A counter named <code>depth</code> holds the number of open angle brackets (<code>&lt;</code>).<br> The variable <code>in_q</code> contains the quote character (<code>'</code> or <code>"</code>) if any, and <code>0</code> otherwise. The last character is stored in the variable <code>lc</code>.</p> <p>The functions holds five states, three are mentioned in the description above the function. Based on this information and the function body, the following states can be derived:</p> <ul> <li>State 0 is the output state (not in any tag)</li> <li>State 1 means we are inside a normal html tag (the tag buffer contains <code>&lt;</code>)</li> <li>State 2 means we are inside a php tag</li> <li>State 3: we came from the output state and encountered the <code>&lt;</code> and <code>!</code> characters (the tag buffer contains <code>&lt;!</code>)</li> <li>State 4: inside HTML comment</li> </ul> <p>We need just to be careful that no tag can be inserted. That is, <code>&lt;</code> followed by a non-whitespace character. <a href="http://lxr.php.net/opengrok/xref/PHP_5_3/ext/standard/string.c#4326" rel="noreferrer">Line 4326</a> checks an case with the <code>&lt;</code> character which is described below:</p> <ul> <li>If inside quotes (e.g. <code>&lt;a href="inside quotes"&gt;</code>), the <code>&lt;</code> character is ignored (removed from the output).</li> <li>If the next character is a whitespace character, <strong><code>&lt;</code> is added to the output buffer</strong>.</li> <li>if outside a HTML tag, the state becomes <code>1</code> ("inside HTML tag") and the last character <code>lc</code> is set to <code>&lt;</code></li> <li>Otherwise, if inside the a HTML tag, the counter named <code>depth</code> is incremented and the character ignored.</li> </ul> <p>If <code>&gt;</code> is met while the tag is open (<code>state == 1</code>), <code>in_q</code> becomes <code>0</code> ("not in a quote") and <code>state</code> becomes <code>0</code> ("not in a tag"). <strong>The tag buffer is discarded.</strong></p> <p>Attribute checks (for characters like <code>'</code> and <code>"</code>) are done on the tag buffer which is discarded. So the conclusion is: </p> <blockquote> <p>strip_tags without a tag whitelist is safe for inclusion outside tags, no tag will be allowed.</p> </blockquote> <p>By "outside tags", I mean not in tags as in <code>&lt;a href="in tag"&gt;outside tag&lt;/a&gt;</code>. Text may contain <code>&lt;</code> and <code>&gt;</code> though, as in <code>&gt;&lt; a&gt;&gt;</code>. The result is not valid HTML though, <code>&lt;</code>, <code>&gt;</code> and <code>&amp;</code> need still to be escaped, especially the <code>&amp;</code>. That can be done with <a href="http://php.net/htmlspecialchars" rel="noreferrer"><code>htmlspecialchars()</code></a>.</p> <p>The description for <code>strip_tags</code> without an whitelist argument would be:</p> <blockquote> <p>Makes sure that no HTML tag exist in the returned string.</p> </blockquote>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload