StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
3141762
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
2010-06-29T14:41:54.973
CreationDate
2010-06-29T14:41:54.973
FavoriteCount
0
LastActivityDate
2010-06-29T14:41:54.973
LastEditDate
LastEditorUserId
0
OwnerUserId
243749
ParentId
3135575
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
Sorry for the length of this. I took the premise of this as a little challenge and came up with a proof of concept in Ruby. I worked on the assumption that you could supply a number of strings that should match the regular expression (HITS) and a number that should fail to match (MISSES). I based the code on a naive implementation of a genetic algorith. See the notes at the bottom for my thoughts on the success, or otherwise, of this approach. <pre><code>LOOP_COUNT = 100 class Attempt # let's try email HITS = %w[j@j.com j@j.co.uk gates@microsoft.com sales@microsoft.com sjobs@apple.com sales@apple.com frddy@aol.com thing1@charity.org sales@mybad.org.uk thing.one@drseuss.com] MISSES = %w[j@j j@j@.com j.com @domain.com nochance eric@google. eric@google.com. username-at-domain-dot-com linux.org eff.org microsoft.com sjobs.apple.com www.apple.com] # odd mixture of numbers and letters, designed to confuse # HITS = %w[a123 a999 a600 a545 a100 b001 b847 a928 c203] # MISSES = %w[abc def ghi jkl mno pqr stu vwx xyz h234 k987] # consonants versus vowels # HITS = %w[bcd cdb fgh ghf jkl klj mnp npm qrs srq tvw vwt xzb bzx] # MISSES = %w[aei oyu oio euu uio ioe aee ooo] # letters < 11 chars and no numbers # HITS = %w[aaa aaaa abaa azaz monkey longstring stringlong] # MISSES = %w[aa aa1 aa0 b9b 6zz longstringz m_m ff5 666 anotherlongstring] MAX_SUCCESSES = HITS.size + MISSES.size # Setup the various Regular Expression operators, etc.. RELEMENTS = %w[. ? * + ( ) \[ \] - | ^ $ \\ : @ / { }] %w[A b B d D S s W w z Z].each do |chr| RELEMENTS << "\\#{chr}" end %w[alnum alpha blank cntrl digit lower print punct space upper xdigit].each do |special| RELEMENTS << "[:#{special}:]" end ('a'..'z').each do |chr| RELEMENTS << chr end ('A'..'Z').each do |chr| RELEMENTS << chr end (0..9).each do |chr| RELEMENTS << chr.to_s end START_SIZE = 8 attr_accessor :operators, :successes def initialize(ary = []) @operators = ary if ary.length < 1 START_SIZE.times do @operators << random_op end end @score = 0 @decay = 1 make_regexp end def make_regexp begin @regexp = Regexp.new( @operators.join("") ) rescue # "INVALID Regexp" @regexp = nil @score = -1000 end end def random_op RELEMENTS[rand(RELEMENTS.size)] end def decay @decay -= 1 end def test @successes = 0 if @regexp HITS.each do |hit| result = (hit =~ @regexp) if result != nil reward end end MISSES.each do |miss| result = (miss =~ @regexp) if result == nil reward end end end @score = @successes self end def reward @successes += 1 end def cross other len = size olen = other.size split = rand(len) ops = [] @operators.length.times do |num| if num < split ops << @operators[num] else ops << other.operators[num + (olen - len)] end end Attempt.new ops end # apply a random mutation, you don't have to use all of them def mutate send [:flip, :add_rand, :add_first, :add_last, :sub_rand, :sub_first, :sub_last, :swap][rand(8)] make_regexp self end ## mutate methods def flip @operators[rand(size)] = random_op end def add_rand @operators.insert rand(size), random_op end def add_first @operators.insert 0, random_op end def add_last @operators << random_op end def sub_rand @operators.delete_at rand(size) end def sub_first @operators.delete_at 0 end def sub_last @operators.delete_at size end def swap to = rand(size) begin from = rand(size) end while to == from @operators[to], @operators[from] = @operators[from], @operators[to] end def regexp_to_s @operators.join("") end def <=> other score <=> other.score end def size @operators.length end def to_s "#{regexp_to_s} #{score}" end def dup Attempt.new @operators.dup end def score if @score > 0 ret = case when (size > START_SIZE * 2) @score-20 when size > START_SIZE @score-2 else @score #+ START_SIZE - size end ret + @decay else @score + @decay end end def == other to_s == other.to_s end def stats puts "Regexp #{@regexp.inspect}" puts "Length #{@operators.length}" puts "Successes #{@successes}/#{MAX_SUCCESSES}" puts "HITS" HITS.each do |hit| result = (hit =~ @regexp) if result == nil puts "\tFAIL #{hit}" else puts "\tOK #{hit} #{result}" end end puts "MISSES" MISSES.each do |miss| result = (miss =~ @regexp) if result == nil puts "\tOK #{miss}" else puts "\tFAIL #{miss} #{result}" end end end end $stderr.reopen("/dev/null", "w") # turn off stderr to stop streams of bad rexexp messages # find some seed attempt values results = [] 10000.times do a = Attempt.new a.test if a.score > 0 # puts "#{a.regexp_to_s} #{a.score}" results << a end end results.sort!.reverse! puts "SEED ATTEMPTS" puts results[0..9] old_result = nil LOOP_COUNT.times do |i| results = results[0..9] results.map {|r| r.decay } 3.times do new_results = results.map {|r| r.dup.mutate.test} results.concat new_results new_results = results.map {|r| r.cross( results[rand(10)] ).test } results.concat new_results end new_results = [] 20.times do new_results << Attempt.new.test end results.concat new_results results.sort!.reverse! if old_result != results[0].score old_result = results[0].score end puts "#{i} #{results[0]}" end puts "\n--------------------------------------------------" puts "Winner! #{results[0]}" puts "--------------------------------------------------\n" results[0].stats </code></pre> Lessons learned from playing with this code. Overall, it appears that running shorter loops several times is most likely to produce a usable result. However, this may be due to my implementation rather than the nature of genetic algorithms. You may get results that work but still contain parts that are gibberish. You are going to need a pretty firm grasp of regular expressions to understand how many of the results actually achieve what they do. Ultimately your time is probably much better spent learning Regular Expressions than trying to use this code as a shortcut. I realise that the questioner may not have had that motive and the reason I tried this was because it was an interesting idea. There are many trade-offs in the results. The more diverse HITS and MISSES you supply, the longer it will take to produce a result and the more loops you will have to run. Having less of each will likely produce a result that is either massively specific to your supplied strings or so generic that it wouldn't be useful in a real world situation. I have also hard-coded some assumptions, such as marking down expressions which get too long.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POProgrammatically derive a regular expression from a string
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJoc
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COVery clever. I've wanted a tool that could generate a regular expression to describe the differences in inputs, but never got much further than Confusion's first answer "return the string". Like you, I'm not sure the tool is really worth it, but it was fun reading all the same. :)
 singulars
 PostPostId
 PO
 UserUserId
 USsarnold

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.