StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POJava: Advice on handling large data volumes. (Part Deux)
text
Body
copied!<p>Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.</p> <p>I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:</p> <ul> <li>One way or another, all the data will be read during the course of processing.</li> <li>Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of <em>each file simultaneously</em>, or the middle few kilobytes of each file simultaneously, etc.</li> <li>There are times when the application will want random access to a byte or two here and there.</li> </ul> <p>Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.</p> <p>Examples for typical access:</p> <ul> <li>Give me the first 10 kilobytes of all my files!</li> <li>Give me byte 0 through 999 of file F, then give me byte 1 through 1000, then give me 2 through 1001, etc, etc, ...</li> <li>Give me a megabyte of data from file F starting at such and such byte!</li> </ul> <p>Any suggestions for a good design?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload