StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3664883
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2010-09-08T05:34:45.477
FavoriteCount
0
LastActivityDate
2010-09-08T05:34:45.477
LastEditDate
2017-05-23T12:17:46.257
LastEditorUserId
-1
OwnerUserId
276101
ParentId
3664881
PostTypeId
2
Score
18
ViewCount
0
LastEditorDisplayName
text
Body
<h3>The Big Picture</h3> We will first look at this regex from the overall big picture algorithm, and then take a closer look at the specific implementation details later. The regex is an almost direct translation of the following Java code: <pre><code>static boolean isPalindrome(String s) { if (s.isEmpty()) { return true; } String g2 = null; for (char ch : s.toCharArray()) { String g1 = String.valueOf(ch); // "add" if (g2 != null && s.endsWith(g1 + g2)) { g2 = g1 + g2; } else if (s.endsWith(g1)) { g2 = g1; } else { break; } } return s.equals(g2); // "chk" } </code></pre> This is obviously not the most straightforward/efficient Java code to check for palindromes, but it works, and most fascinatingly, it's almost directly translatable to regex with a near one-to-one mapping. Here's the regex again, reproduced here for convenience, annotated to highlight the striking resemblance: <pre><code>// isEmpty _for-loop_ // ↓ / \ "(?x) | (?:(.) add)+ chk" // \_/ ↑ // g1 loop body ___g2___ // / \ .replace("add", assertEntirety(".*? (\\1 \\2?)")) .replace("chk", assertEntirety("\\2")); // s.equals(g2) </code></pre> Attachment: <a href="http://ideone.com/53JYq" rel="nofollow noreferrer">annotated and expanded version of the source code on ideone.com</a> (Feel free to ignore the details of <code>assertEntirety</code> for now: just think of it as a black box regex mechanism that can make an assertion on the entire string regardless of where we currently are.) So the basic algorithm is that we try to build a suffix, subject to a palindromic constraint, as we scan the string left to right. We then check if we're able to build the complete string in this manner. If we can, then the string is a palindrome. Also, as a special case, the empty string is trivially a palindrome. Once the big picture algorithm is understood, we can examine how the regex pattern implements it. <hr> <h3>What's with all the <code>String.replace</code>?</h3> Regex patterns in Java are ultimately nothing but strings, meaning they can be derived through string manipulations the way any string can. Yes, we can even use regex to generate a regex pattern -- a sort of meta-regexing approach, if you will. Consider this example of initializing an <code>int</code> constant (which ultimately contains nothing but a number): <pre><code>final int X = 604800; final int Y = 60 * 60 * 24 * 7; // now X == Y </code></pre> The number assigned to <code>X</code> is a literal integer: we can clearly see what the number is. This is not the case with <code>Y</code> which uses an expression instead, and yet this formula seems to convey an idea of what this number represents. Even without proper naming of these constants, we nonetheless get the idea that <code>Y</code> probably represents the number of seconds in a week, even if we may not immediately know what the numeric value is. On the other hand, with <code>X</code> we know precisely that number, but we get less of an idea of what it represents. The use of string replacements in the snippet is an analogous situation but for strings regex patterns. Instead of explicitly writing the pattern as one literal string, sometimes systematic and logical derivation (the "formula") of that value from simpler parts can be much more meaningful. This is especially true with regex, where it often matters more that we understand what the pattern does than being able to see what it looks like as a string literal (which isn't much of a looker anyway, what with all the extra backslashes). A portion of the snippet is reproduced here again for convenience: <pre><code>// the "formula" final String PALINDROME = "(?x) | (?:(.) add)+ chk" .replace("add", assertEntirety(".*? (\\1 \\2?)")) .replace("chk", assertEntirety("\\2")); // the "value" System.out.println(PALINDROME); // ____add_____ chk_ // _______/ \____ _______/ \_____ // (?x) | (?:(.) (?<=(?=^.*? (\1 \2?)$).*))+ (?<=(?=^\2$).*) // | \_/ \______/ | // | 1 2 | // |_______________________________| </code></pre> Without a doubt the "formula" is a lot more readable than the eventual string "value" in this case. There are certainly much more sophisticated ways to programmatically generate a regex pattern, and it's certainly possible to write in such a way that obfuscates instead of accentuates its meaning, but mindful usage of even simple string replacements can still do wonder (as hopefully shown in this example). Lesson: Consider programmatic generation of regex patterns. <hr> <h3>How does <code>add</code> work?</h3> The <code>(?:(.) add)+</code> construct, where <code>add</code> is an assertion that does some sort of "counting", has already been thoroughly discussed in the previous two parts. Two features are worth noting, though: <ul> <li>The <code>(.)</code> captures into group 1, allowing backreference later on</li> <li>The assertion is <code>assertEntirety</code> instead of just looking ahead from our current position <ul> <li>We'll discuss this in more detail later; just think of it as a way to assert on the entire string</li> </ul></li> </ul> The pattern applied to <code>assertEntirety</code> in <code>add</code> is the following: <pre><code># prefix _suffix_ # ↓ / \ .*? ( \1 \2? ) # \________/ i.e. a reluctant "whatever" prefix (as short as possible) # group 2 followed by a suffix captured into group 2 </code></pre> Note that group 2 is self-referencing with an optional specifier, a technique already discussed in part 2 of the series. Needless to say group 2 is our "counter" in this pattern: it's a suffix that we will try to grow leftward on every iteration of the "loop". As we iterate on each <code>(.)</code> left to right, we try to prepend that same character (using backreference to <code>\1</code>) to our suffix. Recall again the Java code translation of the above pattern, reproduced here for convenience: <pre><code>if (g2 != null && s.endsWith(g1 + g2)) { // \2? is greedy, we try this first g2 = g1 + g2; } else if (s.endsWith(g1)) { // since \2? is optional, we may also try this g2 = g1; } else { // if there's no matching suffix, we "break" out of the "loop" break; } </code></pre> The fact that <code>\2?</code> is optional means a few things: <ul> <li>It provides a "base case" for the self-reference (the main reason we do this!)</li> <li>Since <code>\2?</code> is part of the suffix pattern (and thus appears later in the overall pattern), the prefix part must be reluctant, hence <code>.*?</code> instead of <code>.*</code>. This allows <code>\2?</code> to exercise its greediness.</li> <li>The "counter" may also "reset" and give the "wrong" result <ul> <li>In part 2 we showed how backtracking <code>?</code> may result in the same kind of problematic resetting <ul> <li>We solved the problem by using possessive quantifier <code>?+</code>, but this is not applicable here</li> </ul></li> </ul></li> </ul> The third point is elaborated further in the next section. Lesson: Carefully analyze the interactions between greedy/reluctant repetitions in parts of a pattern. <h3>Related questions</h3> <ul> <li><a href="https://stackoverflow.com/questions/3075130/difference-between-and-for-regex/3075532#3075532">Difference between <code>.*?</code> and <code>.*</code> for regex</a></li> <li><a href="https://stackoverflow.com/questions/2565924/regular-expression-whos-greedier">Regular expression: who's greedier?</a></li> </ul> <hr> <h3>Why do we need a <code>chk</code> phase?</h3> As alluded in the previous section, the optional and backtrackable <code>\2?</code> means that our suffix can shrink under some circumstances. We will examine such a scenario step by step for this input: <pre><code> x y x y z y x ↑ # Initial state, \2 is "uninitialized" _ (x)y x y z y x ↑ # \1 captured x, \2 couldn't match \1\2 (since \2 is "uninitialized") # but it could match \1 so it captured x ___ x(y)x y z y x ↑ # \1 captured y, \2 matched \1\2 and grew to capture yx _ x y(x)y z y x ↑ # \1 captured x, \2 couldn't match \1\2, # but it could match \1 so it shrunk to capture x (!!!) ___ x y x(y)z y x ↑ # \1 captured y, \2 matched \1\2 and grew to capture yx _____ x y x y(z)y x ↑ # \1 captured z, \2 matched \1\2 and grew to capture zyx _______ x y x y z(y)x ↑ # \1 captured y, \2 matched \1\2 and grew to capture yzyx _________ x y x y z y(x) ↑ # \1 captured x, \2 matched \1\2 and grew to capture xyzyx </code></pre> We can modify our pattern (and the corresponding Java code) to omit the <code>chk</code> phase, and see that indeed this is what happens: <pre><code> // modified pattern without a chk phase; yields false positives! final String PALINDROME_BROKEN = "(?x) | (?:(.) add)+" .replace("add", assertEntirety(".*? (\\1 \\2?)")); String s = "xyxyzyx"; // NOT a palindrome!!! Matcher m = Pattern.compile(PALINDROME_BROKEN).matcher(s); if (m.matches()) { System.out.println(m.group(2)); // prints "xyzyx" } </code></pre> As explained, <code>"xyxyzyx"</code>, which is NOT a palindrome, is falsely reported as one, because we didn't check if the growing suffix eventually became the complete string (which it clearly did not in this case). The <code>chk</code> phase (which is an <code>assertEntirety</code> of the pattern <code>\2</code>) is therefore an absolute necessity in our setup. We need to confirm that indeed we managed to grow our suffix all the way. If this is the case, then we have ourselves a palindrome. Lesson: Carefully analyze any possibly unintended side-effects of optional self-reference matching. <hr> <h3>The Main Course: <code>assertEntirety</code></h3> While it's neat that we can write a Java regex pattern to detect palindromes, everything here except <code>assertEntirety</code> has already been covered in the previous parts of the series. The only new thing here is this mysterious black box, this powerful mechanism that magically allowed us to do what is otherwise "impossible". The <code>assertEntirety</code> construct is based on the following meta-pattern of nested lookarounds: <blockquote> <code>(?<=(?=^pattern$).*)</code> <blockquote> " I can see a place somewhere behind me where I can look ahead and see <code>^pattern$</code> " </blockquote> </blockquote> The name "lookaround" implies the relativity to our current position: we're looking around us, perhaps in front of or behind, from where we're standing. By nesting a lookahead in a lookbehind in this manner, we're able to metaphorically "fly into the skies" and look at the entire picture. Abstracting this meta-pattern into <code>assertEntirety</code> is a bit like writing preprocessing substitution macros. Having nested lookarounds everywhere probably hurts readability and maintainability, so we encapsulate it into <code>assertEntirety</code>, which not only hides the complexity of its inner workings, but also further emphasizes its semantics by giving it an appropriate name. Lesson: Consider abstracting meta-patterns to hide complexity and convey semantics. <hr> <h3>Appendix: On infinite-length lookbehind in Java</h3> Observant readers would notice that <code>assertEntirety</code> contains a <code>.*</code> in a lookbehind, which makes its theoretical maximum length infinite. No, Java does not officially support infinite-length lookbehind. Yes, as it has been adequatedly demonstrated here, it works anyway. Officially it's categorized as a "bug"; but "someone"(*wink*) could also consider it a "hidden feature". It's certainly possible that this "bug" will be "fixed" in the future. Removal of this hidden feature will break this particular solution to the Java regex palindrome problem. <h3>Related questions</h3> <ul> <li><a href="https://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java">Regex look-behind without obvious maximum length in Java</a></li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow does this Java regex detect palindromes?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USpolygenelubricants
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. CO+1 Wow! Excellent work.
 singulars
 PostPostId
 PO
 UserUserId
 USNate W.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.