Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Regular-expressions.info has a page on <a href="http://www.regular-expressions.info/duplicatelines.html" rel="noreferrer">Deleting Duplicate Lines From a File</a></p> <p>This basically boils down to searching for this oneliner:</p> <pre><code>^(.*)(\r?\n\1)+$ </code></pre> <p>... And replacing with <code>\1</code>.<br> <sub>Note: Dot must not match Newline</sub></p> <p>Explanation:</p> <blockquote> <p>The <a href="http://www.regular-expressions.info/anchors.html" rel="noreferrer">caret</a> will match only at the start of a line. So the regex engine will only attempt to match the remainder of the regex there. The <a href="http://www.regular-expressions.info/dot.html" rel="noreferrer">dot</a> and <a href="http://www.regular-expressions.info/repeat.html" rel="noreferrer">star</a> combination simply matches an entire line, whatever its contents, if any. The parentheses store the matched line into the first backreference.</p> <p>Next we will match the line separator. I put the <a href="http://www.regular-expressions.info/optional.html" rel="noreferrer">question mark</a> into <code>\r?\n</code> to make this regex work with both Windows (<code>\r\n</code>) and UNIX (<code>\n</code>) text files. So up to this point we matched a line and the following line break.</p> <p>Now we need to check if this combination is followed by a duplicate of that same line. We do this simply with <code>\1</code>. This is the first backreference which holds the line we matched. The backreference will match that very same text.</p> <p>If the backreference fails to match, the regex match and the backreference are discarded, and the regex engine tries again at the start of the next line. If the backreference succeeds, the <a href="http://www.regular-expressions.info/repeat.html" rel="noreferrer">plus symbol</a> in the regular expression will try to match additional copies of the line. Finally, the <a href="http://www.regular-expressions.info/anchors.html" rel="noreferrer">dollar symbol</a> forces the regex engine to check if the text matched by the backreference is a complete line. We already know the text matched by the backreference is preceded by a line break (matched by \r?\n). Therefore, we now check if it is also followed by a line break or if it is at the end of the file using the <a href="http://www.regular-expressions.info/anchors.html" rel="noreferrer">dollar sign</a>.</p> <p>The entire match becomes <code>line\nline</code> (or <code>line\nline\nline</code> etc.). Because we are doing a search and replace, the line, its duplicates, and the line breaks in between them, are all deleted from the file. Since we want to keep the original line, but not the duplicates, we use <code>\1</code> as the replacement text to put the original line back in.</p> </blockquote>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload