StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POparse string with potentially 2 occurrences of the same string
text
Body
copied!<p>I'm working on parsing an address string and have found that sometimes the street name contains a word that is also a valid city name. I want to be sure that any second occurrence of city name is always matched to the last group in the regex and the first group in the regex is treated as optional.</p> <p>Here is some sample input:</p> <pre><code>123 SUNNYSIDE AVENUE BROOKLYN 59 MAIDEN LANE MANHATTAN 59 MAIDEN LANE MANHATTAN 10038 39-076 46 STREET SUNNYSIDE 39-076 46 STREET SUNNYSIDE 11104 59 MAIDEN LANE MANHATTAN NY USA </code></pre> <p>Ideally the regex groups returned for these would be as follows:</p> <pre><code>(123 )(SUNNYSIDE)( AVENUE )(BROOKLYN) (59 MAIDEN LANE )(null)(null)(MANHATTAN) (59 MAIDEN LANE )(null)(null)(MANHATTAN) (39-076 46 STREET )(null)(null)(SUNNYSIDE) (39-076 46 STREET )(null)(null)(SUNNYSIDE) (59 MAIDEN LANE )(null)(null)(MANHATTAN) </code></pre> <p>For the cities, I have a list (dumbed down for this example) in a regex group like this:</p> <pre><code>(MANHATTAN|BROOKLYN|SUNNYSIDE) </code></pre> <p>My starting regex was this:</p> <pre><code>(.*?)(?:\W*)(MANHATTAN|BROOKLYN|SUNNYSIDE)(?:.*) </code></pre> <p>But of course that outputs:</p> <pre><code>(123)(SUNNYSIDE) </code></pre> <p>I'm trying to expand it to support the cases mentioned above, but everything I've tried thus far to match 1 or 2 cities will always match the first city it finds as the last group and ignore the remainder.</p> <p>There are a lot of special issues with address parsing, but right now I'm focused on solving just this one particular case. Thanks for any help!</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload