StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCan you provide some examples of why it is hard to parse XML and HTML with a regex?
text
Body
copied!<p>One mistake I see people making <a href="https://stackoverflow.com/questions/699708/variable-order-regex-syntax">over</a> and <a href="https://stackoverflow.com/questions/457015/regular-expression-help">over again</a> is trying to parse XML or HTML with a regex. Here are a few of the reasons parsing XML and HTML is hard:</p> <p>People want to treat a file as a sequence of lines, but this is valid:</p> <pre><code><tag attr="5" /> </code></pre> <p>People want to treat < or <tag as the start of a tag, but stuff like this exists in the wild:</p> <pre><code><img src="imgtag.gif" alt="<img>" /> </code></pre> <p>People often want to match starting tags to ending tags, but XML and HTML allow tags to contain themselves (which traditional regexes cannot handle at all):</p> <pre><code><span id="outer"><span id="inner">foo</span></span> </code></pre> <p>People often want to match against the content of a document (such as the famous "find all phone numbers on a given page" problem), but the data may be marked up (even if it appears to be normal when viewed):</p> <pre><code><span class="phonenum">(<span class="area code">703</span>) <span class="prefix">348</span>-<span class="linenum">3020</span></span> </code></pre> <p>Comments may contain poorly formatted or incomplete tags:</p> <pre><code><a href="foo">foo</a>  <a href="bar">bar</a> </code></pre> <p>What other gotchas are you aware of?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload