Note that there are some explanatory texts on larger screens.

plurals
  1. POPerl: "noisy logs problem" Create array of regex queries from multiple arrays/hashes
    text
    copied!<p><strong>Problem:</strong> I need to pull data from auth logs for approx. 30 locations. The logs are in CSV format. In order for the analysis to be useful the log entries must be matched up with the hours of operation of the locations. The data is stored in directories named for the time period the data covers: eg., data/june1-june30/. The CSV files are simply named with the location code eg., LOC1.csv , LOC2.csv. Here is a sample of a typical log:</p> <pre><code>2010-06-01, 08:30:00 , 0 2010-06-01, 09:30:00 , 1 2010-06-01, 10:30:00 , 10 2010-06-01, 11:30:00 , 7 2010-06-01, 12:30:00 , 8 2010-06-01, 13:30:00 , 6 2010-06-01, 14:30:00 , 3 2010-06-01, 15:30:00 , 8 2010-06-01, 16:30:00 , 11 </code></pre> <p>The entries show the number of successful authenticated sessions during the time period indicated in the 3rd field. The logs represent 24 hours of data which is useless for analysis since the hours of operation differ from location to location. The problem now becomes how to pull only the data that matches the hours of operation. The analysis must show activity for the hours of operation to be useful.</p> <p><strong>Setup - so far</strong> I decided to create a config file using YAML with arrays/hashes for each location.</p> <p>eg.,</p> <pre><code>- branch: headquarters abbrev: HQ months: [04, 06] DOW: [M, T, W, Th] hours: M: [12, 13, 14, 15, 16, 17, 18] T: [12, 13, 14, 15, 16, 17, 18] W: [09, 10, 11, 12, 13, 14, 15, 16, 17, 18] Th: [12, 13, 14, 15, 16, 17, 18, 19, 20] </code></pre> <p>The months designation shows the busiest months, as that's all we care about. </p> <p><strong>Where I'm at</strong> The code will find the appropriate directories using the months array, then it pulls the correct CSV files using the abbrev array. So I have the files I need stored in an array @files. My question comes down to design. The results must be matched to the appropriate dates for each month. Mondays, Tuesdays ...etc. Do I create month arrays storing the dates for each day of the week? I'm stuck and unsure where to go from here. </p> <p>To clarify: The code already pulls the correct files and loads them into an array ( using globbing and Find::File ) for each branch. The question is now about iterating through the @files array for each branch and pulling the info.</p> <p><strong>EDIT:</strong> as per request: I will put up some code. This is the goods for getting a hold of those files by the months indicated in the hash. That's the easy part.</p> <pre><code>foreach my $branch (@$config) { my $name = $branch-&gt;{'branch'}; my $months = $branch-&gt;{'months'}; my $abbrev = $branch-&gt;{'abbrev'}; # find directories for busy months, load in @dirs my @dirs; foreach my $month (@$months) { my $regex2 = qr(stats_2010-$month.*); map { push(@dirs, $_) if $_ =~ $regex2 } @stats_dir; } # find csv files within directories, load in @files my @files; find(\&amp;wanted, @dirs); sub wanted { push(@files, $_) if $_ =~ /$abbrev\.csv/; } </code></pre> <p><strong>Output:</strong> The output I'm hoping to get is: The lines from each file representing the hours of operation for that branch. I think they could be output to a separate file for the sake of simplicity. And in the same format. What makes it hard is that you have to match Mondays,Tuesdays ..etc. with dates somehow. This is due to different hours of operation for different days. </p> <p>Am I making the problem harder than it needs to be? I've sat with this too long and am hoping for a fresh set of eyes to set me straight. My Perl is OK, but I need some help in the design/algorithm dept. I can figure out how to Perlify it, I think. But feel free to post Perl. I love reading good Perl!</p> <p>Eventually I will average the activity for the Mondays, Tuesdays ...etc. of each month.</p> <p>Thanks ~</p> <p>Bubnoff</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload