Note that there are some explanatory texts on larger screens.

plurals
  1. POThe fundamental reason why regex and HTML don't mix? The theory behind it?
    primarykey
    data
    text
    <p>To start with, I cannot do anything but refer to what I believe is the most famous SO post ever:</p> <p><a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags">RegEx match open tags except XHTML self-contained tags</a></p> <p>Now, is it even a question for StackOverflow? I don't know, but I'll try...</p> <p>I'll speak from a personal point of view. While I've never had to do that, I know that the day I have to parse HTML, I will certainly not go with regexes; I'll try and find an HTML parsing library. Fine.</p> <p>But I don't know why.</p> <p>At one point, I decided to do CSS validation in Java. I knew "by the guts" that regexes wouldn't cut it, so I used <a href="https://github.com/sirthias/parboiled/wiki" rel="nofollow noreferrer">Parboiled</a>.</p> <p>And I don't know why.</p> <p>The "why" troubles me. I am no newbie with regexes at all. I just can't put a clear line between what regex engines can, and cannot do.</p> <p>My question is the following: what is this clear line? What fundamental characteristic of an input must exist so that it is mathematically demonstrated that any regex engine cannot reliably determine success and failure?</p> <p>Can you give a simple, theoretical input which would spell failure as to a regex engine's ability to give a reliable "match/no match" answer? If yes, what is the defining characteristic of such an input?</p> <p><strong>EDIT</strong> For the sake of this discussion, I'll add a task suggested by a post on SO (which I can't find the link to at the moment, sorry) which is simpler than HTML, but for which I won't use regexes: shell command line parsing.</p> <p>As far as the shell is concerned, those are equivalent:</p> <pre><code>alias ll="ls -l" alias ll=ls\ -l alias l"l"=ls' -'l "alia"s l"l= "ls\ -l </code></pre> <p>Shell quoting mechanisms are so numerous that I'll just create a Parboiled grammar in this case... But this is "out of my guts". Because I find it easier probably... But that doesn't prove that this is not feasible with regexes.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload