Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to improve perl script performance?
    primarykey
    data
    text
    <p>I am running the ucm2.pl script to scan a huge directory structure (directory is a network drive mapped to local). I have two perl scripts ucm1.pl and ucm2.pl. I am running ucm2.pl parellely for different arguments and it is called through ucm1.pl.</p> <p>ucm1.pl -</p> <pre><code> #!/usr/bin/perl use strict; use warnings; use Parallel::ForkManager; my $filename ="intfSplitList.txt"; #(this will have list of all the input files. eg intfSplit_0....intfSplit_50) my $lines; my $buffer; open(FILE, $filename) or die "Can't open `$filename': $!"; while (&lt;FILE&gt;) { $lines = $.; } close FILE; print "The number of lines in $filename is $lines \n"; my $pm = Parallel::ForkManager-&gt;new($lines); #(it will set the no. of parallel processes) open (my $fh, '&lt;', "intfSplitList.txt") or die $!; while (my $data = &lt;$fh&gt;) { chomp $data; my $pid = $pm-&gt;start and next; system ("perl ucm2.pl -iinput.txt -f$data"); #(call the ucm2.pl) #(input.txt file will have search keyword and $data will have intfSplit_*.txt files) $pm-&gt;finish; # Terminates the child process } </code></pre> <p>ucm2.pl code-</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use File::Find; use Getopt::Std; #getting the input parameters getopts('i:f:'); our($opt_i, $opt_f); my $searchKeyword = $opt_i; #Search keyword file. my $intfSplit = $opt_f; #split file my $path = "Z:/aims/"; #source directory my $searchString; #search keyword open FH, "&gt;log.txt"; #open the log file to write print FH "$intfSplit ". "started at ".(localtime)."\n"; #write the log file open (FILE,$intfSplit); #open the split file to read while(&lt;FILE&gt;){ my $intf= $_; #setting the interface to intf chomp($intf); my $dir = $path.$intf; chomp($dir); print "$dir \n"; open(INP,$searchKeyword); #open the search keyword file to read while (&lt;INP&gt;){ $searchString =$_; #setting the search keyword to searchString chomp($searchString); print "$searchString \n"; open my $out, "&gt;", "vob$intfSplit.txt" or die $!; #open the vobintfSplit_* file to write #calling subroutine printFile to find and print the path of element find(\&amp;printFile,$dir); #the subroutine will search for the keyword and print the path if keyword is exist in file. sub printFile { my $element = $_; if(-f $element &amp;&amp; $element =~ /\.*$/){ open my $in, "&lt;", $element or die $!; while(&lt;$in&gt;) { if (/\Q$searchString\E/) { my $last_update_time = (stat($element))[9]; my $timestamp = localtime($last_update_time); print $out "$File::Find::name". " $timestamp". " $searchString\n"; last; } } } } } } print FH "$intfSplit ". "ended at ".(localtime)."\n"; #write the log file </code></pre> <p>everything is running fine but its running for very long time for single keyword search also. can anyone please suggest some better way to improve the performance.</p> <p>Thanks in advance!!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. COWhat timing and instrumentation have you done? The first step in any effort to improve perfomance is to measure current performance.
      singulars
    2. CO@MartinSkøtt is entirely correct. You need to figure out what bit is actually slow before you can make any meaningful optimisations - otherwise you're just guessing, and that might succeed but it's unlikely to take you as far as you can. Generally with parallel problems the things to examine are whether you're using the right amount of parallelism (time to start children vs. processing time vs. available parallel resource on the system) and whether the children are competing for the same resources in a way which reduces their parallelism.
      singulars
    3. COYou are reading your `intfSplitList.txt` twice just to count the lines. That's not a performance-killer, but it's unnecessary. Also I believe that all your processes will overwrite each others' log files as they all open the same file for reading, not appending. You should use a log file for each of them and merge them later. You can do that by simply adding a timestamp with microseconds, then concating all the files together and doing a sort on the timestamp. There are more things that could be optimised here, but I agree with the others. Figure out what's slow. Check out Devel::NYTProf.
      singulars
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload