Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>More or less as discussed in comments:</p> <blockquote> <p>Make copies of the source string and the search string. Eliminate all the control characters from the two copies. Search with the copy of the search string in the copy of the source string. You can do case conversion as well if you need to (or accent removal, or ...). Using a lot of <code>\s*</code> will probably dramatically slow down your regex.</p> <p>The search string only needs to be copied and preprocessed only once. Each source string will need to be copied and preprocessed once too. If the worst comes to the worst, when you know there's a match, you can go back to your original source string and make a new copy of the search string so that you do have something like the <code>\s*</code> between each regular character, and apply the regex from the second (mutilated) copy of the search string to the original source string. Because you know there's a match, the performance should be reasonable, even if the fail-to-match mode would be far too slow.</p> </blockquote> <p>Here's a Perl implementation of the ideas discussed.</p> <pre><code>#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Useqq = 1; my $source = "'Twas (Tweedle-Dee's)\fBirthday\n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; my $search = "(\fTwee\ndle\t-\tDee\r'\rs)\nBi\frth\fday"; print Data::Dumper-&gt;Dump([$source], [qw($source)]); print Data::Dumper-&gt;Dump([$search], [qw($search)]); my $c_source = $source; my $c_search = $search; $c_source =~ s/ |[[:cntrl:]]//g; # Or s/\s//g; $c_search =~ s/ |[[:cntrl:]]//g; # Or s/\s//g; print Data::Dumper-&gt;Dump([$c_source], [qw($c_source)]); print Data::Dumper-&gt;Dump([$c_search], [qw($c_search)]); if ($c_source =~ m/\Q$c_search\E/) { # Locating the search in the original source...hard work... my @a_search = split //, $c_search; printf "Lengths: c_search %d; a_search %d\n", length($c_search), scalar(@a_search); @a_search = map { s/[][\\.*?+(){}]/\\$&amp;/g; $_ } @a_search; # Escape regex metacharacters #print Data::Dumper-&gt;Dump([\@a_search], [qw(@a_search)]); my $r_search = join "\\s*", @a_search; print Data::Dumper-&gt;Dump([$r_search], [qw($r_search)]); my $t_source = $source; $t_source =~ s/$r_search//g; print Data::Dumper-&gt;Dump([$t_source], [qw($t_source)]); } </code></pre> <p>Good clean hieroglyphic fun — clear as mud, no doubt. The first three lines check that there aren't any silly mistakes. The <code>Data::Dumper</code> module prints data unambiguously; it is there for debugging. The <code>Useqq</code> variable tweaks the way the data is printed unambiguously.</p> <p>The variables <code>$source</code> and <code>$search</code> are the source string and the search string. There's a match, despite all the control characters in each of them. Note that there are some regex metacharacters in the mix — parentheses are regex metacharacters. These strings are dumped for reference.</p> <p>The next two lines make copies of the search and source strings. The control characters and spaces are removed, using a POSIX-based regex class to specify all control characters. These converted strings are dumped for inspection.</p> <p>The <code>if</code> statement compares the converted source with the converted search. The <code>\Q...\E</code> parts suppress the meaning of regex metacharacters in between. If there's a match, then we enter the block of code in braces.</p> <p>The <code>split</code> operation creates an array of single characters from the converted search string. The <code>printf</code> checks sanity. The <code>map</code> operation replaces each regex metacharacter with backslash and the metacharacter, leaving other characters unchanged. The <code>join</code> collects each character or character pair in the array <code>@a_search</code> into a string <code>$r_search</code> with <code>\s*</code> separating the array entries.</p> <p>The variable <code>$t_source</code> is another copy of the source. The regex in <code>$r_search</code> is applied to <code>$t_search</code> and any matches are replaced with nothing. The result is dumped. The output from this script is:</p> <pre><code>$source = "'Twas (Tweedle-Dee's)\fBirthday\n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; $search = "(\fTwee\ndle\t-\tDee\r'\rs)\nBi\frth\fday"; $c_source = "'Twas(Tweedle-Dee's)Birthdayandallhisfriendswerehappy"; $c_search = "(Tweedle-Dee's)Birthday"; Lengths: c_search 23; a_search 23 $r_search = "\\(\\s*T\\s*w\\s*e\\s*e\\s*d\\s*l\\s*e\\s*-\\s*D\\s*e\\s*e\\s*'\\s*s\\s*\\)\\s*B\\s*i\\s*r\\s*t\\s*h\\s*d\\s*a\\s*y"; $t_source = "'Twas \n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; </code></pre> <p>The string <code>$t_source</code> does indeed correspond to <code>$source</code> with '(Tweedle-Dee's) Birthday' removed, which seems to meet the requirements.</p> <p>Converting this into Ruby is left as an exercise for the masochistic^H^H^H^H^H^H^H^H^H^H^H interested reader.</p> <p>Clearly, you could simply create and use the <code>$r_search</code> string as a regex and apply it direct to (a copy of) <code>$source</code>; it would work. But I'm deeply suspicious that if you applied it to kilobyte length source strings, the code would run very slowly. I've not done the measurements to prove it though.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload