Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Assuming you are using valid XHTML, it's simple to parse the HTML and make sure tags are handled properly. You simply need to track which tags have been opened so far, and make sure to close them again "on your way out".</p> <pre><code>&lt;?php header('Content-type: text/plain; charset=utf-8'); function printTruncated($maxLength, $html, $isUtf8=true) { $printedLength = 0; $position = 0; $tags = array(); // For UTF-8, we need to count multibyte sequences as one character. $re = $isUtf8 ? '{&lt;/?([a-z]+)[^&gt;]*&gt;|&amp;#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}' : '{&lt;/?([a-z]+)[^&gt;]*&gt;|&amp;#?[a-zA-Z0-9]+;}'; while ($printedLength &lt; $maxLength &amp;&amp; preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position)) { list($tag, $tagPosition) = $match[0]; // Print text leading up to the tag. $str = substr($html, $position, $tagPosition - $position); if ($printedLength + strlen($str) &gt; $maxLength) { print(substr($str, 0, $maxLength - $printedLength)); $printedLength = $maxLength; break; } print($str); $printedLength += strlen($str); if ($printedLength &gt;= $maxLength) break; if ($tag[0] == '&amp;' || ord($tag) &gt;= 0x80) { // Pass the entity or UTF-8 multibyte sequence through unchanged. print($tag); $printedLength++; } else { // Handle the tag. $tagName = $match[1][0]; if ($tag[1] == '/') { // This is a closing tag. $openingTag = array_pop($tags); assert($openingTag == $tagName); // check that tags are properly nested. print($tag); } else if ($tag[strlen($tag) - 2] == '/') { // Self-closing tag. print($tag); } else { // Opening tag. print($tag); $tags[] = $tagName; } } // Continue after the tag. $position = $tagPosition + strlen($tag); } // Print any remaining text. if ($printedLength &lt; $maxLength &amp;&amp; $position &lt; strlen($html)) print(substr($html, $position, $maxLength - $printedLength)); // Close any open tags. while (!empty($tags)) printf('&lt;/%s&gt;', array_pop($tags)); } printTruncated(10, '&lt;b&gt;&amp;lt;Hello&amp;gt;&lt;/b&gt; &lt;img src="world.png" alt="" /&gt; world!'); print("\n"); printTruncated(10, '&lt;table&gt;&lt;tr&gt;&lt;td&gt;Heck, &lt;/td&gt;&lt;td&gt;throw&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;in a&lt;/td&gt;&lt;td&gt;table&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;'); print("\n"); printTruncated(10, "&lt;em&gt;&lt;b&gt;Hello&lt;/b&gt;&amp;#20;w\xC3\xB8rld!&lt;/em&gt;"); print("\n"); </code></pre> <p><strong>Encoding note</strong>: The above code assumes the XHTML is <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1" rel="noreferrer">UTF-8</a> encoded. ASCII-compatible single-byte encodings (such as <a href="http://en.wikipedia.org/wiki/UTF-8" rel="noreferrer">Latin-1</a>) are also supported, just pass <code>false</code> as the third argument. Other multibyte encodings are not supported, though you may hack in support by using <code>mb_convert_encoding</code> to convert to UTF-8 before calling the function, then converting back again in every <code>print</code> statement.</p> <p>(You should <em>always</em> be using UTF-8, though.)</p> <p><strong>Edit</strong>: Updated to handle character entities and UTF-8. Fixed bug where the function would print one character too many, if that character was a character entity.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload