StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWhy it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
primarykey
Id
6751105
data
AcceptedAnswerId
6751339
AnswerCount
8
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2011-07-19T17:06:13.250
FavoriteCount
51
LastActivityDate
2018-06-22T18:15:40.257
LastEditDate
2017-05-23T11:54:25.417
LastEditorUserId
-1
OwnerUserId
146792
ParentId
0
PostTypeId
1
Score
92
ViewCount
20744
LastEditorDisplayName
text
Body
There is no day on SO that passes without a question about parsing (X)HTML or XML with regular expressions being asked. While it's relatively easy to come up with <a href="https://stackoverflow.com/q/701166/146792">examples that demonstrates the non-viability of regexes for this task</a> or with a <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">collection of expressions</a> to represent the concept, I could still not find on SO a formal explanation of why this is not possible done in layman's terms. The only formal explanations I could find so far on this site are probably extremely accurate, but also quite cryptic to the self-taught programmer: <blockquote> the flaw here is that HTML is a Chomsky Type 2 grammar (context free grammar) and RegEx is a Chomsky Type 3 grammar (regular expression) </blockquote> or: <blockquote> Regular expressions can only match regular languages but HTML is a context-free language. </blockquote> or: <blockquote> A finite automaton (which is the data structure underlying a regular expression) does not have memory apart from the state it's in, and if you have arbitrarily deep nesting, you need an arbitrarily large automaton, which collides with the notion of a finite automaton. </blockquote> or: <blockquote> The Pumping lemma for regular languages is the reason why you can't do that. </blockquote> [To be fair: the majority of the above explanation link to wikipedia pages, but these are not much easier to understand than the answers themselves]. So my question is: could somebody please provide a translation in layman's terms of the formal explanations given above of why it is not possible to use regex for parsing (X)HTML/XML? EDIT: After reading the first answer I thought that I should clarify: I am looking for a "translation" that also briefely explains the concepts it tries to translate: at the end of an answer, the reader should have a rough idea - for example - of what "regular language" and "context-free grammar" mean...
Tags
<regex><language-agnostic>
Title
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USmac
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
3. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POWhy it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POWhy it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POWhy it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.