Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Your best bet is to skip trying to match the front and end of the string at all. You must do that if you use the <code>matches()</code> method, but you don't if you use the <code>find()</code> method. That's probably what you want instead.</p> <pre><code>Pattern p = Pattern.compile("&lt;b&gt;\\s*([^&lt;]*)\\s*&lt;\\/b&gt;"); Matcher m = p.matcher("some &lt;b&gt;text&lt;/b&gt;"); m.find(); </code></pre> <p>You can use <code>start()</code> and <code>end()</code> to find the indexes within the source string containing the match. You can use <code>group()</code> to find the contents of the <code>()</code> capture within the match (i.e., the text inside the bold tag.</p> <p>In my experience, using regular expressions to process HTML is very fragile and works well in only the most trivial cases. You might have better luck using a full blown XML parser instead, but if this is one of those trivial cases, have at it.</p> <p><strong>Original Answer:</strong> Here is my original answer sharing why a <code>.*</code> at the beginning of a match will perform so badly.</p> <p>The problem with using <code>.*</code> at the front is that it will cause lots of backtracking in your match. For example, consider the following:</p> <pre><code>Pattern p = Pattern.compile("(.*)ab(.*)"); Matcher m = p.matcher("aaabaaa"); m.matches(); </code></pre> <p>The match will proceed like this:</p> <ol> <li>The matcher will attempt to suck the whole string, "aaabaaa", into the first <code>.*</code>, but then tries to match <code>a</code> and fails.</li> <li>The matcher will back up and match "aaabaa", then tries to match <code>a</code> and succeeds, but tries to match <code>b</code> and fails.</li> <li>The matcher will back up and match "aaaba", then tries to match <code>a</code> and succeeds, but tries to match <code>b</code> and fails.</li> <li>The matcher will back up and match "aaab", then tries to match <code>a</code> and succeeds, but tries to match <code>b</code> and fails.</li> <li>The matcher will back up and match "aaa", then tries to match <code>a</code> and fails.</li> <li>The matcher will back up and match "aa", then tries to match <code>a</code> and succeeds, tries <code>b</code> and succeeds, and then matches "aaa" to the final <code>.*</code>. Success.</li> </ol> <p>You want to avoid a really broad match toward the beginning of your pattern matches whenever possible. Without knowing your actual problem, it would be very difficult to suggest something better.</p> <p><strong>Update:</strong> Anirudha suggests using <code>(.*?)ab(.*)</code> as a possible fix to avoid backtracking. This will short circuit backtracking to some extent, but at the cost of trying to apply the next match on each try. So now, consider the following:</p> <pre><code>Pattern p = Pattern.compile("(.*?)ab(.*)"); Matcher m = p.matcher("aaabaaa"); m.matches(); </code></pre> <p>It will proceed like this:</p> <ol> <li>The matcher will attempt to match nothing, "", into the first <code>.*?</code>, tries to match <code>a</code> and succeeds, but fails to match <code>b</code>.</li> <li>The matcher will attempt to match the first letter, "a", into the first <code>.*?</code>, tries to match <code>a</code> and succeeds, but fails to match <code>b</code>.</li> <li>The matcher will attempt to match the first two letters, "aa", into the first <code>.*?</code>, tries to match <code>a</code> and succeeds, tries to match <code>b</code> and succeeds, and then slurps up the rest into <code>.*</code>, "aaa". Success.</li> </ol> <p>There aren't any backtracks this time, but we still have a more complicated matching process for each forward move within <code>.*?</code>. This may be a performance gain for a particular match or a loss if iterating through the match forward happens to be slower.</p> <p>This also changes the way the match will proceed. The <code>.*</code> match is greedy and tries to match as much as possible where as <code>.*?</code> is more conservative. </p> <p>For example, the string "aaabaaabaaa".</p> <p>The first pattern, <code>(.*)ab(.*)</code> will match "aaabaa" to the first capture and "aaa" to the second.</p> <p>The second pattern, <code>(.*?)ab(.*)</code> will match "aa" to the first capture and "aaabaaa" to the second.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload