Note that there are some explanatory texts on larger screens.

plurals
  1. POJava regEx URL matching issue
    text
    copied!<p>and as usual thank you in advance.</p> <p>I am trying to familiarize myself with regEx and I am having an issue matching a URL.</p> <p>Here is an example URL:</p> <pre><code>www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>here is what my regex breakdown looks like:</p> <pre><code>[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html </code></pre> <p>the <code>[id]</code> is a string 22 characters in length that can be either uppercase or lowercase letters, as well as numbers. However, I do not want to extract that from the URL. Just clarifying</p> <p>Now, I need to extract two values from this url. </p> <p>First, I need to extract the dirs(s). However, the <code>[dir]</code> is optional, but also can be as many as wanted. In other words that parameter could not be there, or it could be <code>dir1/dir2/dir3</code> ..etc . So, going off my first example :</p> <pre><code> www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>Here I would need to extract <code>dir1/dir2/dir3</code> where a dir is a string that is a single word with all lowercase letters (ie sports/mlb/games). There are no numbers in the dir, only using that as an example.</p> <p>But in this example of a valid URL:</p> <pre><code>www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>There is no <code>[dir]</code> so I would not extract anything. thus, the <code>[dir]</code> is optional</p> <p>Secondly, I need to extract the <code>[storyTitle]</code> where the <code>[storyTitle]</code> is also optional just like the <code>[dir]</code> above, but however if there is a <code>storyTitle</code> there can only be one.</p> <p>So going off my previous examples</p> <pre><code>www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>would be valid where I need to extract <code>'title-of-some-story'</code> where story titles are dash separated strings that are always lowercase. The example belowis also valid:</p> <pre><code>www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>In the above example, there is no <code>[storyTitle]</code> thus making it optional </p> <p>Lastly, just to be thorough, a URL without a <code>[dir]</code> and without a <code>[storyTitle]</code> are also valid. Example:</p> <pre><code>www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html </code></pre> <p>Is a valid URL. Any input would be helpful I hope I am clear.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload