StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>I tried to include all necessary sanity checks and minimize disc-I/O (assuming your files being big enough that reading them is the time limiting factor). Also the files never have to be read in memory as a whole (assuming your files may be even bigger than the available RAM).</p> <p>However, this was only tried using a very basic dummy input - so please test it and report any problems.</p> <p>First I wrote a script trimming one pair (identified by the f...L file name):</p> <pre><code>#!/bin/sh ############# # trim_pair # #-----------############################# # given fXL file path, trim fXL and fXR # ######################################### #---------------# # sanity checks # #---------------# # error function error(){ echo >&2 "$@" exit 1 } # argument given? [[ $# -eq 1 ]] || \ error "usage: $0 <file>" LFILE="$1" # argument format valid? [[ `basename "$LFILE" | egrep '^f[[:digit:]]+L$'` ]] || \ error "invalid file name: $LFILE (has to match /^f[[:digit:]]+L$/)" RFILE="`echo $LFILE | sed s/L$/R/`" # is there a better POSIX compliant way? # files exists? [[ -e "$LFILE" ]] || \ error "file does not exist: $LFILE" [[ -e "$RFILE" ]] || \ error "file does not exist: $RFILE" # files readable? [[ -r "$LFILE" ]] || \ error "file not readable: $LFILE" [[ -r "$RFILE" ]] || \ error "file not readable: $RFILE" # files writable? [[ -w "$LFILE" ]] || \ error "file not writable: $LFILE" [[ -w "$RFILE" ]] || \ error "file not writable: $RFILE" #------------------# # create tmp files # # & ensure removal # #------------------# # cleanup function cleanup(){ [[ -e "$LTMP" ]] && rm -- "$LTMP" [[ -e "$RTMP" ]] && rm -- "$RTMP" } # cleanup on exit trap 'cleanup' EXIT #create tmp files LTMP=`mktemp --tmpdir` || \ error "tmp file creation failed" RTMP=`mktemp --tmpdir` || \ error "tmp file creation failed" #----------------------# # process both files # # prepended by their # # first and last lines # #----------------------# # extract first and last lines without reading the whole files twice { head -q -n1 "$LFILE" "$RFILE" # no need to read the whole files tail -q -n1 "$LFILE" "$RFILE" # no need to read the whole files } | awk -F, ' NF!=2{ print "incorrect file format: record "FNR" in file "FILENAME > "/dev/stderr" exit 1 } NR==1{ # read record 1, x1=$1 # field 1 of L, next # then read } NR==2{ # record 1 of R, x1=$1>x1?$1:x1 # field 1 & take the max, next # then } NR==3{ # read last record, x2=$1 # field 1 of L, next # then } NR==4{ # last record of R x2=$1>x2?$1:x2 # field 1 & take the max next } FILENAME!="-"&&NR<5{ print "too few lines in input" > "/dev/stderr" } FNR==1{ outfile=FILENAME~/L$/?"'"$LTMP"'":"'"$RTMP"'" } $1>=x1&&$1<=x2{ print > outfile } ' - "$LFILE" "$RFILE" || \ error "error while trimming" #-----------------------# # re-save trimmed files # # under the same names # #-----------------------# mv -- "$LTMP" "$LFILE" || \ error "cannot re-save $LFILE" mv -- "$RTMP" "$RFILE" || \ error "cannot re-save $RFILE" </code></pre> <p>As you can see, the main idea was to prepend the input by the important lines using <code>head</code> and <code>tail</code> before processing them using <code>awk</code> as you requested.</p> <p>To call that script for all files in a certain directory, you can use the following script (not as worked-out as the above, but I guess you could come up with something like that yourself):</p> <pre><code>#!/bin/sh ############ # trim all # #----------################################### # find L files in current or given directory # # and trim the corresponding file pairs # ############################################## TRIM_PAIR="trim_pair" # path to the trim script for one pair if [[ $# -eq 1 ]] then WD="$1" else WD="`pwd`" fi find "$WD" \ -type f \ -readable \ -writable \ -regextype posix-egrep \ -regex "^$WD/"'f[[:digit:]]+L' \ -exec "$TRIM_PAIR" "{}" \; </code></pre> <p>Note that you must either have the trim_pair script on your <code>PATH</code> or adjust the <code>TRIM_PAIR</code> variable in the <code>trim_all</code> script.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload