Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>It is a good idea to start your specs/question with a story ("I have a ..."). But such a story - whether true or made up, because you can't disclose the truth - should give</p> <ul> <li>a vivid picture of the situation/problem/task</li> <li>the reason(s) why all the work must be done</li> <li>definitions for uncommon(ly used)terms</li> </ul> <p>So I'd start with: I'm working in a prison and have to scan the emails of the inmates for</p> <ul> <li>names (like "Al Capone") mentioned anywhere in the text; the director wants to read those mails in toto</li> <li>order lines (like "weapon: AK 4711 quantity: 14"); the ordnance officer wants those info to calculate the amount of ammunition and rack space needed</li> <li>paragraphs containing 'family'-keywords like "wife", "child", ...; the parson wants to prepare her sermons efficiently</li> </ul> <p>Taken for itself, each of the terms "keyword" (~running text) and "attribute" (~structured text) of may be 'clear', but if both are applied to "the X I have to search for", things get mushy. Instead of general ("chunk") and technical ("string") terms, you should use 'real-world' (line) and specific (paragraph) words. Samples of your input:</p> <pre><code>From: Robin Hood To: Scarface Hi Scarface, tell Al Capone to send a car to the prison gate on sunday. For the riot we need: weapon: AK 4711 quantity: 14 knife: Bowie quantity: 8 Tell my wife in Folsom to send some money to my son in Alcatraz. Regards Robin </code></pre> <p>and your expected output:</p> <pre><code>--- Robin.txt ---- keywords: Al Capone: Yes Billy the Kid: No Scarface: Yes order lines: knife: knife: Bowie quantity: 8 machine gun: stinger rocket: weapon: weapon: AK 4711 quantity: 14 social relations paragaphs: Tell my wife in Folsom to send some money to my son in Alcatraz. </code></pre> <p>Pseudo code should begin at the top level. If you start with</p> <pre><code>for each file in folder load search list process current file('s content) using search list </code></pre> <p>it's obvious that</p> <pre><code>load search list for each file in folder process current file using search list </code></pre> <p>would be much better.</p> <p>Based on this story, examples, and top level plan, I would try to come up with proof of concept code for a simplified version of the "process current file('s content) using search list" task:</p> <pre><code>given file/text to search in and list of keywords/attributes print file name print "keywords:" for each boolean item print boolean item text if found anywhere in whole text print "Yes" else print "No" print "order line:" for each line item print line item text if found anywhere in whole text print whole line print "social relations paragaphs:" for each paragraph for each social relation item if found print paragraph no need to check for other items </code></pre> <p>first implementation attempt:</p> <pre><code>use Modern::Perl; #use English qw(-no_match_vars); use English; exit step_00(); sub step_00 { # given file/text to search in my $whole_text = &lt;&lt;"EOT"; From: Robin Hood To: Scarface Hi Scarface, tell Al Capone to send a car to the prison gate on sunday. For the riot we need: weapon: AK 4711 quantity: 14 knife: Bowie quantity: 8 Tell my wife in Folsom to send some money to my son in Alcatraz. Regards Robin EOT # print file name say "--- Robin.txt ---"; # print "keywords:" say "keywords:"; # for each boolean item for my $bi ("Al Capone", "Billy the Kid", "Scarface") { # print boolean item text printf " %s: ", $bi; # if found anywhere in whole text if ($whole_text =~ /$bi/) { # print "Yes" say "Yes"; # else } else { # print "No" say "No"; } } # print "order line:" say "order lines:"; # for each line item for my $li ("knife", "machine gun", "stinger rocket", "weapon") { # print line item text # if found anywhere in whole text if ($whole_text =~ /^$li.*$/m) { # print whole line say " ", $MATCH; } } # print "social relations paragaphs:" say "social relations paragaphs:"; # for each paragraph for my $para (split /\n\n/, $whole_text) { # for each social relation item for my $sr ("wife", "son", "husband") { # if found if ($para =~ /$sr/) { ## if ($para =~ /\b$sr\b/) { # print paragraph say $para; # no need to check for other items last; } } } return 0; } </code></pre> <p>output:</p> <pre><code>perl 16953439.pl --- Robin.txt --- keywords: Al Capone: Yes Billy the Kid: No Scarface: Yes order lines: knife: Bowie quantity: 8 weapon: AK 4711 quantity: 14 social relations paragaphs: tell Al Capone to send a car to the prison gate on sunday. Tell my wife in Folsom to send some money to my son in Alcatraz. </code></pre> <p>Such (premature) code helps you to</p> <ul> <li>clarify your specs (Should not-found keywords go into the output?</li> <li>Is your search list really flat or should it be structured/grouped?)</li> <li>check your assumptions about how to do things (Should the order line search be done on the array of lines of thw whole text?)</li> <li>identify topics for further research/rtfm (eg. regex (prison!))</li> <li>plan your next steps (folder loop, read input file)</li> </ul> <p>(in addition, people in the know will point out all my bad practices, so you can avoid them from the start)</p> <p>Good luck!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload