Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <blockquote> <p>Can I change the maximum allowed heap space if I am using Swank-Clojure (via Leiningen) the JVM has on startup?</p> </blockquote> <p>You can change the Java heap size by supplying the -Xms (min heap) and -Xmx (max heap) options at startup, see the <a href="http://download-llnw.oracle.com/javase/6/docs/technotes/tools/windows/java.html#options" rel="noreferrer">docs</a>.</p> <p>So something like <code>java -Xms256m -Xmx1024m ...</code> would give 256MB initial heap with the option to grow to 1GB. </p> <p>I don't use Leiningen/Swank, but I expect that it's possible to change it. If nothing else, there should be a startup script for Java somewhere where you can change the arguments.</p> <blockquote> <p>If I package this application (like I plan to) as an Uberjar, would I be able to ensure my JVM has some kind of minimum heap space?</p> </blockquote> <p>Memory isn't controlled from within a jar file, but from the startup script, normally a .sh or .bat file that calls java and supplies the arguments.</p> <blockquote> <p>Can I "sample" from the file; e.g. read only every z lines?</p> </blockquote> <p><a href="http://download.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html" rel="noreferrer">java.io.RandomAccessFile</a> gives random file access by byte index, which you can build on to sample the contents. </p> <blockquote> <p>Would it be possible to read in only parts of a large (text) file at a time, so I could import and process the data in "chunks", e.g, n lines at a time? If so, how?</p> </blockquote> <p><a href="http://clojuredocs.org/v/2048" rel="noreferrer">line-seq</a> returns a lazy sequence of each line in a file, so you can process as much at a time as you wish. </p> <p>Alternatively, use the Java mechanisms in <a href="http://download-llnw.oracle.com/javase/6/docs/api/java/io/package-summary.html" rel="noreferrer">java.io</a> - <code>BufferedReader.readLine()</code> or <code>FileInputStream.read(byte[] buffer)</code></p> <blockquote> <p>Is there some faster way of accessing the file I'd be reading from (potentially rapidly, depending on the implementation), other than simply reading from it a bit at a time?</p> </blockquote> <p>Within Java/Clojure there is BufferedReader, or you can maintain your own byte buffer and read larger chunks at a time. </p> <p>To make the most out of the memory you have, keep the data as primitive as possible. </p> <p>For some actual numbers, let's assume you want to graph the contents of a music CD:</p> <ul> <li>A CD has two channels, each with 44,100 samples per second <ul> <li>60 min. of music is then ~300 million data points</li> </ul></li> <li>Represented as 16 bits (2 bytes, a short) per datapoint: 600MB</li> <li>Represented as primitive int array (4 bytes per datapoint): 1.2GB</li> <li>Represented as Integer array (32 bytes per datapoint): 10GB</li> </ul> <p>Using the numbers from <a href="http://devblog.streamy.com/2009/07/24/determine-size-of-java-object-class/" rel="noreferrer">this blog</a> for object size (16 byte overhead per object, 4 bytes for primitive int, objects aligned to 8-byte boundaries, 8-byte pointers in the array = 32 bytes per Integer datapoint).</p> <p>Even 600MB of data is a stretch to keep in memory all at once on a "normal" computer, since you will probably be using lots of memory elsewhere too. But the switch from primitive to boxed numbers will all by itself reduce the number of datapoints you can hold in memory by an order of magnitude.</p> <p>If you were to graph the data from a 60 min CD on a 1900 pixel wide "overview" timeline, you would have one pixel to display two seconds of music (~180,000 datapoints). This is clearly way too little to show any level of detail, you would want some form of subsampling or summary data there.</p> <p>So the solution you describe - process the full dataset one chunk at a time for a summary display in the 'overview' timeline, and keep only the small subset for the main "detail" window in memory - sounds perfectly reasonable.</p> <p><strong>Update:</strong></p> <p>On fast file reads: <a href="http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly" rel="noreferrer">This article</a> times the file reading speed for 13 different ways to read a 100MB file in Java - the <a href="http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly#Fullplot" rel="noreferrer">results</a> vary from 0.5 seconds to 10 minutes(!). In general, reading is fast with a decent buffer size (4k to 8k bytes) and (very) slow when reading one byte at a time. </p> <p>The article also has a <a href="http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly#ComparisontoC" rel="noreferrer">comparison to C</a> in case anyone is interested. (Spoiler: The fastest Java reads are within a factor 2 of a memory-mapped file in C.)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload