Note that there are some explanatory texts on larger screens.

plurals
  1. POJava regex dies on stack overflow: need a better version
    primarykey
    data
    text
    <p>I'm working on a <a href="http://github.com/cletus/jmd" rel="noreferrer">JMD (Java MarkDown)</a> (a Java port of <a href="http://code.google.com/p/markdownsharp/" rel="noreferrer">MarkDownSharp</a>) but I'm having an issue with one regex in particular. For the file <a href="http://github.com/cletus/jmd/blob/master/src/test/resources/functionality/Markdown_Documentation_Syntax.text" rel="noreferrer">Markdown_Documentation_Syntax.text</a> this regular expression dies:</p> <pre><code>private static final String BLOCK_TAGS_1 = "p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del"; private static final String BLOCKS_NESTED_PATTERN = String.format("" + "(" + // save in $1 "^" + // start of line (with MULTILINE) "&lt;(%s)" + // start tag = $2 "\\b" + // word break "(.*\\n)*?" + // any number of lines, minimally matching "&lt;/\\2&gt;" + // the matching end tag "[ \\t]*" + // trailing spaces/tags "(?=\\n+|\\Z)" + // followed by a newline or end of ")", BLOCK_TAGS_1); </code></pre> <p>which translates to:</p> <pre><code>(^&lt;(p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del)\b(.*\n)*?&lt;/\2&gt;[ \t]*(?=\n+|\Z)) </code></pre> <p>This pattern is looking for accepted block tags that are anchored to the start of a line, followed by any number of lines and then are terminated by a matching tag followed by a newline or a string terminator. This generates:</p> <pre><code>java.lang.StackOverflowError at java.util.regex.Pattern$Curly.match(Pattern.java:3744) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4357) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366) at java.util.regex.Pattern$Curly.match0(Pattern.java:3782) at java.util.regex.Pattern$Curly.match(Pattern.java:3744) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4357) ... </code></pre> <p>This can be dealt with by increasing the stack space for Java (defaults to 128k/400k for oss/ss IIRC) but the above expression is slow anyway.</p> <p>So I'm looking for a regex guru who can do better (or at least explain the performance problem with this pattern). The C# version is a little slow but works fine. PHP seems to have no issues with this either.</p> <p><strong>Edit:</strong> This is on JDK6u17 running on Windows 7 64 Ultimate.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload