StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
15516764
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-03-20T06:29:09.577
FavoriteCount
0
LastActivityDate
2013-03-20T06:29:09.577
LastEditDate
LastEditorUserId
0
OwnerUserId
15168
ParentId
15512804
PostTypeId
2
Score
0
ViewCount
0
LastEditorDisplayName
text
Body
More or less as discussed in comments: <blockquote> Make copies of the source string and the search string. Eliminate all the control characters from the two copies. Search with the copy of the search string in the copy of the source string. You can do case conversion as well if you need to (or accent removal, or ...). Using a lot of <code>\s*</code> will probably dramatically slow down your regex. The search string only needs to be copied and preprocessed only once. Each source string will need to be copied and preprocessed once too. If the worst comes to the worst, when you know there's a match, you can go back to your original source string and make a new copy of the search string so that you do have something like the <code>\s*</code> between each regular character, and apply the regex from the second (mutilated) copy of the search string to the original source string. Because you know there's a match, the performance should be reasonable, even if the fail-to-match mode would be far too slow. </blockquote> Here's a Perl implementation of the ideas discussed. <pre><code>#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Useqq = 1; my $source = "'Twas (Tweedle-Dee's)\fBirthday\n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; my $search = "(\fTwee\ndle\t-\tDee\r'\rs)\nBi\frth\fday"; print Data::Dumper->Dump([$source], [qw($source)]); print Data::Dumper->Dump([$search], [qw($search)]); my $c_source = $source; my $c_search = $search; $c_source =~ s/ |[[:cntrl:]]//g; # Or s/\s//g; $c_search =~ s/ |[[:cntrl:]]//g; # Or s/\s//g; print Data::Dumper->Dump([$c_source], [qw($c_source)]); print Data::Dumper->Dump([$c_search], [qw($c_search)]); if ($c_source =~ m/\Q$c_search\E/) { # Locating the search in the original source...hard work... my @a_search = split //, $c_search; printf "Lengths: c_search %d; a_search %d\n", length($c_search), scalar(@a_search); @a_search = map { s/[][\\.*?+(){}]/\\$&/g; $_ } @a_search; # Escape regex metacharacters #print Data::Dumper->Dump([\@a_search], [qw(@a_search)]); my $r_search = join "\\s*", @a_search; print Data::Dumper->Dump([$r_search], [qw($r_search)]); my $t_source = $source; $t_source =~ s/$r_search//g; print Data::Dumper->Dump([$t_source], [qw($t_source)]); } </code></pre> Good clean hieroglyphic fun — clear as mud, no doubt. The first three lines check that there aren't any silly mistakes. The <code>Data::Dumper</code> module prints data unambiguously; it is there for debugging. The <code>Useqq</code> variable tweaks the way the data is printed unambiguously. The variables <code>$source</code> and <code>$search</code> are the source string and the search string. There's a match, despite all the control characters in each of them. Note that there are some regex metacharacters in the mix — parentheses are regex metacharacters. These strings are dumped for reference. The next two lines make copies of the search and source strings. The control characters and spaces are removed, using a POSIX-based regex class to specify all control characters. These converted strings are dumped for inspection. The <code>if</code> statement compares the converted source with the converted search. The <code>\Q...\E</code> parts suppress the meaning of regex metacharacters in between. If there's a match, then we enter the block of code in braces. The <code>split</code> operation creates an array of single characters from the converted search string. The <code>printf</code> checks sanity. The <code>map</code> operation replaces each regex metacharacter with backslash and the metacharacter, leaving other characters unchanged. The <code>join</code> collects each character or character pair in the array <code>@a_search</code> into a string <code>$r_search</code> with <code>\s*</code> separating the array entries. The variable <code>$t_source</code> is another copy of the source. The regex in <code>$r_search</code> is applied to <code>$t_search</code> and any matches are replaced with nothing. The result is dumped. The output from this script is: <pre><code>$source = "'Twas (Tweedle-Dee's)\fBirthday\n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; $search = "(\fTwee\ndle\t-\tDee\r'\rs)\nBi\frth\fday"; $c_source = "'Twas(Tweedle-Dee's)Birthdayandallhisfriendswerehappy"; $c_search = "(Tweedle-Dee's)Birthday"; Lengths: c_search 23; a_search 23 $r_search = "\$\\s*T\\s*w\\s*e\\s*e\\s*d\\s*l\\s*e\\s*-\\s*D\\s*e\\s*e\\s*'\\s*s\\s*\$\\s*B\\s*i\\s*r\\s*t\\s*h\\s*d\\s*a\\s*y"; $t_source = "'Twas \n\n\f\f\nand\ta\tl\tl\this friends were happy\n"; </code></pre> The string <code>$t_source</code> does indeed correspond to <code>$source</code> with '(Tweedle-Dee's) Birthday' removed, which seems to meet the requirements. Converting this into Ruby is left as an exercise for the masochistic^H^H^H^H^H^H^H^H^H^H^H interested reader. Clearly, you could simply create and use the <code>$r_search</code> string as a regex and apply it direct to (a copy of) <code>$source</code>; it would work. But I'm deeply suspicious that if you applied it to kilobyte length source strings, the code would run very slowly. I've not done the measurements to prove it though.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow can I do a text search that ignores control characters?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJonathan Leffler
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COThanks Jonathan, following some sleep and your very complete response I now understand what you were saying in your original answer. I will run some timings to see the performance impact in ruby and post back here.
 singulars
 PostPostId
 PO
 UserUserId
 USuser2188711

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.