Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here's the solution:</p> <pre><code>&lt;?php // here's the pattern: $pattern = '/&lt;(\w+)(\s+(\w+)\s*\=\s*(\'|")(.*?)\\4\s*)*\s*(\/&gt;|&gt;)/'; // a string to parse: $string = 'Hello, try clicking &lt;a href="#paragraph"&gt;here&lt;/a&gt; &lt;br/&gt;and check out.&lt;hr /&gt; &lt;h2&gt;title&lt;/h2&gt; &lt;a name ="paragraph" rel= "I\'m an anchor"&gt;&lt;/a&gt; Fine, &lt;span title=\'highlight the "punch"\'&gt;thanks&lt;span&gt;. &lt;div class = "clear"&gt;&lt;/div&gt; &lt;br&gt;'; // let's get the occurrences: preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER); // print the result: print_r($matches[0]); ?&gt; </code></pre> <p>To test it deeply, I entered in the string auto-closing tags like:</p> <ol> <li>&lt;hr /&gt;</li> <li>&lt;br/&gt;</li> <li>&lt;br&gt;</li> </ol> <p>I also entered tags with:</p> <ol> <li>one attribute</li> <li>more than one attribute</li> <li>attributes which value is bound either into <strong>single quotes</strong> or into <strong>double quotes</strong></li> <li>attributes containing single quotes when the delimiter is a double quote and vice versa</li> <li>"unpretty" attributes with a space before the "=" symbol, after it and both before and after it.</li> </ol> <p>Should you find something which does not work in the proof of concept above, I am available in analyzing the code to improve my skills.</p> <p><strong>&lt;EDIT&gt;</strong> I forgot that the question from the user was to avoid the parsing of self-closing tags. In this case the pattern is simpler, turning into this:</p> <pre><code>$pattern = '/&lt;(\w+)(\s+(\w+)\s*\=\s*(\'|")(.*?)\\4\s*)*\s*&gt;/'; </code></pre> <p>The user @ridgerunner noticed that the pattern does not allow <strong>unquoted attributes</strong> or <strong>attributes with no value</strong>. In this case a fine tuning brings us the following pattern:</p> <pre><code>$pattern = '/&lt;(\w+)(\s+(\w+)(\s*\=\s*(\'|"|)(.*?)\\5\s*)?)*\s*&gt;/'; </code></pre> <p><strong>&lt;/EDIT&gt;</strong></p> <h1>Understanding the pattern</h1> <p>If someone is interested in learning more about the pattern, I provide some line:</p> <ol> <li>the first sub-expression (\w+) matches the tag name</li> <li>the second sub-expression contains the pattern of an attribute. It is composed by: <ol> <li>one or more whitespaces \s+</li> <li>the name of the attribute (\w+)</li> <li>zero or more whitespaces \s* (it is possible or not, leaving blanks here)</li> <li>the "=" symbol</li> <li>again, zero or more whitespaces</li> <li>the delimiter of the attribute value, a single or double quote ('|"). In the pattern, the single quote is escaped because it coincides with the PHP string delimiter. This sub-expression is captured with the parentheses so it can be referenced again to parse the closure of the attribute, that's why it is very important.</li> <li>the value of the attribute, matched by <em>almost</em> anything: (.*?); in this specific syntax, using the <strong>greedy match</strong> (the question mark after the asterisk) the RegExp engine enables a "look-ahead"-like operator, which matches anything but what follows this sub-expression</li> <li>here comes the fun: the \4 part is a <strong>backreference operator</strong>, which refers to a sub-expression defined before in the pattern, in this case, I am referring to the fourth sub-expression, which is the first attribute delimiter found</li> <li>zero or more whitespaces \s*</li> <li>the attribute sub-expression ends here, with the specification of zero or more possible occurrences, given by the asterisk.</li> </ol></li> <li>Then, since a tag may end with a whitespace before the "&gt;" symbol, zero or more whitespaces are matched with the \s* subpattern.</li> <li>The tag to match may end with a simple "&gt;" symbol, or a possible XHTML closure, which makes use of the slash before it: (/>|>). The slash is, of course, escaped since it coincides with the regular expression delimiter.</li> </ol> <p>Small tip: to better analyze this code it is necessary looking at the source code generated since I did not provide any HTML special characters escaping.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload