Note that there are some explanatory texts on larger screens.

plurals
  1. POPHP Regex Check if two strings share two common characters
    text
    copied!<p>I'm just getting to know regular expressions, but after doing quite a bit of reading (and learning quite a lot), I still have not been able to figure out a good solution to this problem.</p> <p>Let me be clear, I understand that this particular problem might be better solved <em>not</em> using regular expressions, but for the sake of brevity let me just say that I need to use regular expressions (trust me, I know there are better ways to solve this).</p> <p>Here's the problem. I'm given a big file, each line of which is exactly 4 characters long.</p> <p>This is a regex that defines "valid" lines:</p> <pre><code>"/^[AB][CD][EF][GH]$/m" </code></pre> <p>In english, each line has either A or B at position 0, either C or D at position 1, either E or F at position 2, and either G or H at position 3. I can assume that each line will be exactly 4 characters long.</p> <p>What I'm trying to do is given one of those lines, match all other lines that contain 2 or more common characters.</p> <p><strong>The below example assumes the following:</strong></p> <ol> <li> <code>$line</code> is always a valid format <li> <code>BigFileOfLines.txt</code> contains only valid lines </ol> <p><strong>Example:</strong></p> <pre><code>// Matches all other lines in string that share 2 or more characters in common // with "$line" function findMatchingLines($line, $subject) { $regex = "magic regex I'm looking for here"; $matchingLines = array(); preg_match_all($regex, $subject, $matchingLines); return $matchingLines; } // Example Usage $fileContents = file_get_contents("BigFileOfLines.txt"); $matchingLines = findMatchingLines("ACFG", $fileContents); /* * Desired return value (Note: this is an example set, there * could be more or less than this) * * BCEG * ADFG * BCFG * BDFG */ </code></pre> <p>One way I know that <em>will</em> work is to have a regex like the following (the following regex would only work for "ACFG":</p> <p><code>"/^(?:AC.{2}|.CF.|.{2}FG|A.F.|A.{2}G|.C.G)$/m"</code></p> <p>This works alright, performance is acceptable. What bothers me about it though is that I have to generate this based off of <code>$line</code>, where I'd rather have it be ignorant of what the specific parameter is. Also, this solution doesn't scale terrible well if later the code is modified to match say, 3 or more characters, or if the size of each line grows from 4 to 16.</p> <p>It just feels like there's something remarkably simple that I'm overlooking. Also seems like this could be a duplicate question, but none of the other questions I've looked at really seem to address this particular problem.</p> <p>Thanks in advance!</p> <p><strong>Update:</strong></p> <p>It seems that the norm with Regex answers is for SO users to simply post a regular expression and say "This should work for you."</p> <p>I think that's kind of a halfway answer. I really want to <em>understand</em> the regular expression, so if you can include in your answer a thorough (within reason) explanation of why that regular expression:</p> <ul> <li>A. Works <li>B. Is the most efficient (I feel there are a sufficient number of assumptions that can be made about the subject string that a fair amount of optimization can be done). </ul> <p>Of course, if you give an answer that works, and nobody else posts the answer *with* a solution, I'll mark it as the answer :)</p> <p><strong>Update 2:</strong></p> <p>Thank you all for the great responses, a lot of helpful information, and a lot of you had valid solutions. I chose the answer I did because after running performance tests, it was the best solution, averaging equal runtimes with the other solutions.</p> <p>The reasons I favor this answer:</p> <ol> <li>The regular expression given provides excellent scalability for longer lines <li>The regular expression looks a lot cleaner, and is easier for mere mortals such as myself to interpret. </ol> <p>However, a lot of credit goes to the below answers as well for being very thorough in explaining why their solution is the best. If you've come across this question because it's something you're trying to figure out, please give them all a read, helped me tremendously.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload