Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This is a sketch in Common Lisp</p> <p>Note that for long files there is a penalty for using READ-LINE, because it conses a fresh string for each line. Then use one of the derivatives of READ-LINE that are floating around that are using a line buffer. Also you might check if you want the hash table be case sensitive or not.</p> <p><strong>second version</strong></p> <p>Splitting the string is no longer needed, because we do it here. It is low level code, in the hope that some speed gains will be possible. It checks for one or more spaces as field delimiter and also tabs.</p> <pre><code>(defun read-a-line (stream) (let ((line (read-line stream nil nil))) (flet ((delimiter-p (c) (or (char= c #\space) (char= c #\tab)))) (when line (let* ((s0 (position-if #'delimiter-p line)) (s1 (position-if-not #'delimiter-p line :start s0)) (s2 (position-if #'delimiter-p line :start (1+ s1))) (s3 (position-if #'delimiter-p line :from-end t))) (values (subseq line 0 s0) (list (read-from-string line nil nil :start s1 :end s2) (subseq line (1+ s3))))))))) </code></pre> <p>Above function returns two values: the key and a list of the rest.</p> <pre><code>(defun dbscan (top-5-table stream) "get triples from each line and put them in the hash table" (loop with aa = nil and bbcc = nil do (multiple-value-setq (aa bbcc) (read-a-line stream)) while aa do (setf (gethash aa top-5-table) (let ((l (merge 'list (gethash aa top-5-table) (list bbcc) #'&gt; :key #'first))) (or (and (nth 5 l) (subseq l 0 5)) l))))) (defun dbprint (table output) "print the hashtable contents" (maphash (lambda (aa value) (loop for (bb cc) in value do (format output "~a ~a ~a~%" aa bb cc))) table)) (defun dbsum (input &amp;optional (output *standard-output*)) "scan and sum from a stream" (let ((top-5-table (make-hash-table :test #'equal))) (dbscan top-5-table input) (dbprint top-5-table output))) (defun fsum (infile outfile) "scan and sum a file" (with-open-file (input infile :direction :input) (with-open-file (output outfile :direction :output :if-exists :supersede) (dbsum input output)))) </code></pre> <p><strong>some test data</strong></p> <pre><code>(defun create-test-data (&amp;key (file "/tmp/test.data") (n-lines 100000)) (with-open-file (stream file :direction :output :if-exists :supersede) (loop repeat n-lines do (format stream "~a ~a ~a~%" (random 1000) (random 100.0) (random 10000))))) </code></pre> <p>; (create-test-data)</p> <pre><code>(defun test () (time (fsum "/tmp/test.data" "/tmp/result.data"))) </code></pre> <p><strong>third version, LispWorks</strong></p> <p>Uses some SPLIT-STRING and PARSE-FLOAT functions, otherwise generic CL.</p> <pre><code>(defun fsum (infile outfile) (let ((top-5-table (make-hash-table :size 50000000 :test #'equal))) (with-open-file (input infile :direction :input) (loop for line = (read-line input nil nil) while line do (destructuring-bind (aa bb cc) (split-string '(#\space #\tab) line) (setf bb (parse-float bb)) (let ((v (gethash aa top-5-table))) (unless v (setf (gethash aa top-5-table) (setf v (make-array 6 :fill-pointer 0)))) (vector-push (cons bb cc) v) (when (&gt; (length v) 5) (setf (fill-pointer (sort v #'&gt; :key #'car)) 5)))))) (with-open-file (output outfile :direction :output :if-exists :supersede) (maphash (lambda (aa value) (loop for (bb . cc) across value do (format output "~a ~f ~a~%" aa bb cc))) top-5-table)))) </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload