Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h2>☝ Getting Started with /Regexes/</h2> <p>Regular expressions are a form of <em>declarative programming</em>. If you are used to imperative, functional, or object-oriented programming, then they are a very different way of thinking. It’s a rules-based approach with subtle backtracking issues. I daresay a background in Prolog might actually do you some good with these, which certainly isn’t something I commonly advise.</p> <p>Normally I would just have people play around with the <code>grep</code> command from their shell, then advance to using regexes for searching and replacing in their editor.</p> <p>But I’m guessing you aren’t coming from a Unix background, because if you were, you would have come across regexes all over, from the very most basic <code>grep</code> command to pattern-matching in the <code>vi</code> or <code>emacs</code> editors. You can look at the <code>grep</code> manpage by typing </p> <pre><code>% man grep </code></pre> <p>on your <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=grep&amp;apropos=0&amp;sektion=0&amp;manpath=OpenBSD+Current&amp;arch=i386&amp;format=html" rel="nofollow noreferrer">BSD</a>, <a href="http://linux.die.net/man/1/grep" rel="nofollow noreferrer">Linux</a>, <a href="http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/grep.1.html" rel="nofollow noreferrer">Apple</a>, or <a href="http://manpages.unixforum.co.uk/man-pages/unix/solaris-10-11_06/1/grep-man-page.html" rel="nofollow noreferrer">Sun</a> systems — just to name a few.</p> <p>                                     ☹ <em>¡ʇɟoƨoɹɔᴉƜ ʇnoqɐ əɯ ʞƨɐ ʇ ̦uop əƨɐəld ʇƨnɾ</em>  ☹</p> <hr> <h2>☟ (?: Book Learnin’? )</h2> <p>If you ran into regular expresions at school or university, it was probably in the context of automata theory. They come up when discussing <a href="http://en.wikipedia.org/wiki/Regular_language" rel="nofollow noreferrer">regular languages</a>. If you have suffered through such classes, you may remember that regular expressions are the <em>user-friendly face</em> to messy finite automata. What they probably did <em>not</em> teach you, however, is that outside of the ivory tower, the regular expressions people actually use to in the real world are far, far behind "regular" in the rarefied, theoretical, and highly <em>irregular</em> sense of that otherwise commonplace word. This means that the modern regular expressions — call them patterns if you prefer — can do much more than the traditional regular expressions taught in computer science classes. There just isn’t any REGULAR left in modern regular expressions outside the classroom, but this is a good thing.</p> <p>I say “modern”, but in fact regular expressions haven’t been regular since Ken Thompson first put back references into his backtracking NFA, back when he was famousluy proving NFA–DFA equivalence. So unless you actually are using a DFA engine, it might be best to just forget any book-learnin’ nonsense about REGULARness of regexes. It just doesn’t apply to the way we really use them every day in the real world.</p> <p><em>Modern</em> regular expressions allow for much more than just back references though, as you will find once you delve into them. They’re their own wonderful world, even if that world is a bit surreal at times. They can let you substitute for pages and pages of code in just one line. They can also make you lose hair over their crazy behavior. Sometimes they make your computer seem like it’s hung, because it’s actually working very hard in a race between it and the heat-death of the universe in some awful O(2ⁿ) algorithm, or even worse. It can easily be much worse, actually. That’s what having this sort of power in your hands can do. There are no training wheel or slow lane. Regexes are a power tool <em>par excellence</em>.</p> <hr> <h2>/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/</h2> <p>⁠ ⁠ ⁠ </p> <p>Just one more thing before I give you a big list of helpful references. As I’ve <a href="https://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491">already said today elsewhere</a>, regexes do not have to be ugly, and they do not have to be hard. <strong>REMEMBER: If you create ugly regexes, it is only a reflection on <em>you</em>, not on <em>them</em>.</strong></p> <p>That’s absolutely <strong>no</strong> excuse for creating regexes that are hard to read. Oh, there’s plenty like that out there all right, but they shouldn’t be and they needn’t be. Even though regexes are (for the most part( a form of declarative programming, all the software engineering techniques that one uses in other forms of programming   ̲s̲t̲i̲l̲l̲ ̲a̲p̲p̲l̲y̲ ̲h̲e̲r̲e̲!</p> <p>A regex should never look like a dense row of punctuation that’s impossible to decipher. <em>Any</em> language would be a disaster if you removed all the alphabetical identifiers, removed all whitespace and indentation, removed all comments, and removed every last trace of top-down programming. So of course they look like cr@p if you do that. Don’t <em>do</em> that!</p> <p>So use <em>all</em> of those basic tools, including aesthetically pleasing code layout, careful problem decomposition, named subroutines, decoupling the declaration from the execution (including ordering!), unit testing, plus all the rest, whenever you’re creating regexes. These are all critical steps in <a href="https://stackoverflow.com/questions/764247/why-are-regular-expressions-so-controversial/4053506#4053506">making your patterns <em>maintainable</em></a>.</p> <p>It’s one thing to write <code>/(.)\1/</code>, but quite another to write something like <code>mǁ☕⅋⚣⁑™∞¶⌘℈‽#♬❧ǁ</code>. Those are regexes from the Dark Ages: don’t just reject them: burn them at the stake! It’s <em>programming</em>, after all, not line-noise or golf! </p> <hr> <h2>☞ Regex References</h2> <ol> <li><p>The <a href="http://en.wikipedia.org/wiki/Regular_expression" rel="nofollow noreferrer">Wikipedia page</a> on regular expressions is a decent enough overview.</p></li> <li><p>IBM has a <a href="http://www.ibm.com/developerworks/aix/library/au-speakingunix9/index.html" rel="nofollow noreferrer">nice introduction</a> to regexes in their <em>Speaking Unix</em> series. </p></li> <li><p>Russ Cox has a very nice list of <a href="http://swtch.com/~rsc/regexp/" rel="nofollow noreferrer">classic regular expressions references</a>. You might want to check out the original <a href="http://perldoc.perl.org/perlre.html#Version-8-Regular-Expressions" rel="nofollow noreferrer">Version 8 regular expressions</a>, here found in a Perl manpage, but these were the original, most basic patterns that everybody grew up with back in olden days.</p></li> <li><p><a href="http://regex.info/" rel="nofollow noreferrer"> <em>Mastering Regular Expressions</em> </a> from O’Reilly, by Jeffrey Friedl.</p></li> <li><p><a href="http://www.regular-expressions.info/" rel="nofollow noreferrer">Jan Goyvaerts’s <em>regular-expressions.info</em> site</a> and his <a href="http://oreilly.com/catalog/9780596520694" rel="nofollow noreferrer"> <em>Regular Expression Cookbook</em></a>, also from O’Reilly.</p></li> <li><p>I’m a native speaker of Perl, so let me say four words about it. Chapter 5 of the <a href="http://oreilly.com/catalog/9780596003135" rel="nofollow noreferrer"><em>Perl Cookbook</em></a> and Chapter 6 of <a href="http://oreilly.com/catalog/9780596000271" rel="nofollow noreferrer"><em>Programming Perl</em></a>, both somewhat embarrassingly by <a href="http://en.wikipedia.org/wiki/Tom_Christiansen" rel="nofollow noreferrer">yours truly</a> <em>et alios</em>, also from O’Reilly, are devoted to regular expressions in Perl. Perl was the language that originated most regex features found in modern regular expressions, and it continues to lead the pack. Perl’s Unicode support for regexes is especially rich and remarkably simple to use — in comparison with other languages’. You can download all the code examples from those two books from the O’Reilly site, or see the next item. The <a href="http://perldoc.perl.org/" rel="nofollow noreferrer">perldoc.org site</a> has quite a bit on pattern matching, including the <a href="http://perldoc.perl.org/perlre.html" rel="nofollow noreferrer">perlre</a> and <a href="http://perldoc.perl.org/perluniprops.html" rel="nofollow noreferrer">perluniprops</a> manpages, just to take a couple of starting points.</p></li> <li><p>Apropos the <em>Perl Cookbook</em>, the <a href="http://pleac.sourceforge.net/" rel="nofollow noreferrer">PLEAC</a> project has reïmplemented the <em>Perl Cookbook</em> code in a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl. If you look at what each language does for their equivalent of <em>PCB</em>’s regex chapter, you will learn a <em>tremendously huge amount</em> about how that language deals with regular expressions. It’s a marvellous resource and quite an eye-opener, even if some up the solutions are, um, supoptimal.</p></li> <li><p><a href="http://www.regular-expressions.info/javabook.html" rel="nofollow noreferrer"><em>Java Regular Expressions</em></a> by Mehran Habibi from Apress. It’s certainly better than trying to figure anything out by reading <a href="http://download.java.net/jdk7/docs/api/java/util/regex/Pattern.html" rel="nofollow noreferrer">Sun’s documentation on the Pattern class</a>. Java is probably the worst possible language for learning regexes in; it is very clumsy and often completely stupid. I speak from painful personal experience, not from ignorance, and <a href="http://www.google.com/codesearch/p?hl=en#4vlodpF_ctc/nexus/nexus-utils/src/main/java/org/sonatype/nexus/util/Inflector.java&amp;q=%66%75%63%6b%69%6e%67%20lang%3ajava&amp;sa=N&amp;cd=225&amp;ct=rc" rel="nofollow noreferrer">I am hardly alone</a> in this appraisal. If you have to use a JVM language, I recommend <a href="http://groovy.codehaus.org/" rel="nofollow noreferrer">Groovy</a> or perhaps <a href="http://www.scala-lang.org/api/current/scala/util/matching/Regex.html" rel="nofollow noreferrer">Scala</a>. Unfortunately, both are based on the standard Java pattern matching classes, so share their inadequacies.</p></li> <li><p>If you need Unicode and you’re using Java or C⁺⁺ instead of Perl, then I recommend looking into the <a href="http://site.icu-project.org/" rel="nofollow noreferrer">ICU library</a>. They handle Unicode in Java much better than Sun does, but it still feels too much like assembler for my tastes. Perl and Java appear to have the best support for Unicode and multiple encodings. Java is still kinda warty, but other languages often have this even worse. Be warned that languages with regexes bolted on the site are always clumsier to use them in than those that don’t. </p></li> <li><p>If you’re using C, then I would probably skip over the <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=regex&amp;apropos=0&amp;sektion=0&amp;manpath=OpenBSD+Current&amp;arch=i386&amp;format=html" rel="nofollow noreferrer">system-supplied regex library</a> and jump right into <a href="http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions" rel="nofollow noreferrer">PCRE by Phil Hazel</a>. A bonus is that PCRE <em>can</em> be built to handle Unicode reasonably well. It is also the basic regex library used by several other languages and tools, including PHP.</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload