Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I would go with the second idea. Here is a simple Perl program that will read a list of words from the first file provided and print a count of each word in the list from the second file provided in tab-separated format. The list of words in the first file should be provided one per line.</p> <pre><code>#!/usr/bin/perl use strict; use warnings; my $word_list_file = shift; my $process_file = shift; my %word_counts; # Open the word list file, read a line at a time, remove the newline, # add it to the hash of words to track, initialize the count to zero open(WORDS, $word_list_file) or die "Failed to open list file: $!\n"; while (&lt;WORDS&gt;) { chomp; # Store words in lowercase for case-insensitive match $word_counts{lc($_)} = 0; } close(WORDS); # Read the text file one line at a time, break the text up into words # based on word boundaries (\b), iterate through each word incrementing # the word count in the word hash if the word is in the hash open(FILE, $process_file) or die "Failed to open process file: $!\n"; while (&lt;FILE&gt;) { chomp; while ( /-$/ ) { # If the line ends in a hyphen, remove the hyphen and # continue reading lines until we find one that doesn't chop; my $next_line = &lt;FILE&gt;; defined($next_line) ? $_ .= $next_line : last; } my @words = split /\b/, lc; # Split the lower-cased version of the string foreach my $word (@words) { $word_counts{$word}++ if exists $word_counts{$word}; } } close(FILE); # Print each word in the hash in alphabetical order along with the # number of time encountered, delimited by tabs (\t) foreach my $word (sort keys %word_counts) { print "$word\t$word_counts{$word}\n" } </code></pre> <p>If the file words.txt contains:</p> <pre><code>linux frequencies science words </code></pre> <p>And the file text.txt contains the text of your post, the following command:</p> <pre><code>perl analyze.pl words.txt text.txt </code></pre> <p>will print:</p> <pre><code>frequencies 3 linux 1 science 1 words 3 </code></pre> <p>Note that breaking on word boundaries using \b may not work the way you want in all cases, for example, if your text files contain words that are hyphenated across lines you will need to do something a little more intelligent to match these. In this case you could check to see if the last character in a line is a hyphen and, if it is, just remove the hyphen and read another line before splitting the line into words.</p> <p><strong>Edit</strong>: Updated version that handles words case-insensitively and handles hyphenated words across lines.</p> <p>Note that if there are hyphenated words, some of which are broken across lines and some that are not, this won't find them all because it only removed hyphens at the end of a line. In this case you may want to just remove all hyphens and match words after the hyphens are removed. You can do this by simply adding the following line right before the split function:</p> <pre><code>s/-//g; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload