Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>In the spirit of Tom Christiansen's lexer solution, here's a link to Robert Cameron's seemingly forgotten 1998 article, <em>REX: XML Shallow Parsing with Regular Expressions.</em></p> <p><a href="http://www.cs.sfu.ca/~cameron/REX.html">http://www.cs.sfu.ca/~cameron/REX.html</a></p> <blockquote> <p>Abstract</p> <p>The syntax of XML is simple enough that it is possible to parse an XML document into a list of its markup and text items using a single regular expression. Such a shallow parse of an XML document can be very useful for the construction of a variety of lightweight XML processing tools. However, complex regular expressions can be difficult to construct and even more difficult to read. Using a form of literate programming for regular expressions, this paper documents a set of XML shallow parsing expressions that can be used a basis for simple, correct, efficient, robust and language-independent XML shallow parsing. Complete shallow parser implementations of less than 50 lines each in Perl, JavaScript and Lex/Flex are also given.</p> </blockquote> <p>If you enjoy reading about regular expressions, Cameron's paper is fascinating. His writing is concise, thorough, and very detailed. He's not simply showing you how to construct the REX regular expression but also an approach for building up any complex regex from smaller parts.</p> <p>I've been using the REX regular expression on and off for 10 years to solve the sort of problem the initial poster asked about (how do I match this particular tag but not some other very similar tag?). I've found the regex he developed to be completely reliable.</p> <p>REX is particularly useful when you're focusing on lexical details of a document -- for example, when transforming one kind of text document (e.g., plain text, XML, SGML, HTML) into another, where the document may not be valid, well formed, or even parsable for most of the transformation. It lets you target islands of markup anywhere within a document without disturbing the rest of the document.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload