Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I grabbed <a href="http://www.gutenberg.org/ebooks/2600" rel="nofollow">the text of War and Peace</a> from the Gutenberg project and wrote the following script. If prints all words in <code>/usr/share/dict/words</code> which are also in <code>war_and_peace.txt</code>. You can change that with:</p> <pre><code>perl findwords.pl --wordlist=/path/to/wordlist --text=/path/to/text &gt; wordsfound.txt </code></pre> <p>On my computer, it takes just over a second to run.</p> <pre><code>use strict; use warnings; use utf8::all; use Getopt::Long; my $wordlist = '/usr/share/dict/words'; my $text = 'war_and_peace.txt'; GetOptions( "worlist=s" =&gt; \$wordlist, "text=s" =&gt; \$text, ); open my $text_fh, '&lt;', $text or die "Cannot open '$text' for reading: $!"; my %is_in_text; while ( my $line = &lt;$text_fh&gt; ) { chomp($line); # you will want to customize this line my @words = grep { $_ } split /[[:punct:][:space:]]/ =&gt; $line; next unless @words; # This beasty uses the 'x' builtin in list context to assign # the value of 1 to all keys (the words) @is_in_text{@words} = (1) x @words; } open my $wordlist_fh, '&lt;', $wordlist or die "Cannot open '$wordlist' for reading: $!"; while ( my $word = &lt;$wordlist_fh&gt; ) { chomp($word); if ( $is_in_text{$word} ) { print "$word\n"; } } </code></pre> <p>And here's my timing:</p> <pre><code>• [ovid] $ wc -w war_and_peace.txt 565450 war_and_peace.txt • [ovid] $ time perl findwords.pl &gt; wordsfound.txt real 0m1.081s user 0m1.076s sys 0m0.000s • [ovid] $ wc -w wordsfound.txt 15277 wordsfound.txt </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload