Note that there are some explanatory texts on larger screens.

plurals
  1. POIs there any efficient method to find the first byte of all instances of a particular 4 byte block in a file?
    primarykey
    data
    text
    <p>I have files that contains archived binary messages. A small file is around 600MB and contains nearly 9000 messages. Each message begins with a particular four byte flag that I know, which indicates the first four bytes of the message header (and as such must be captured). The message header is a fixed size for all messages. The message header is followed by a payload of a size that is identified in the header. Once I've found the start of a particular message header, I know how many bytes to the end of the header and can use that to extract the number of bytes in the message I need to parse this archive file and isolate each message for processing, making sure that I include all bytes from the first byte of the four byte flag to the end of the specified message length. There is some padding between the messages that varies.</p> <p>Due to the size of the file, I don't want to (and probably can't in all cases) consume the file as a single array. Therefore, I'm looking at things like <code>RandomAccessFile</code> and <code>FileInputStream</code>. It doesn't seem like it's a simple task to scan a file for a particular sequence of bytes and then take every byte from the first byte in that sequence through a known length. <code>RandomAccessFile</code>, especially the <code>read(byte[])</code> and <code>seek()</code> methods seem like they will allow me to implement a solution.</p> <p>To give an idea, my current implementation involves a method called <code>findFlag()</code> that takes a start position in the <code>RandomAccessFile</code>. It seeks to that position and reads the four bytes starting there. If it finds the flag, it returns <code>startPos</code>. Otherwise, it calls itself recursively, moving to <code>startPos + 1</code> and repeats until it finds the flag. Since I know the last byte I read as part of the data message, I would start seeking there:</p> <pre><code>file.seek(startPos); byte[] possibleFlag = new byte[4]; file.read(possibleFlag, 0, possibleFlag.length); if (Arrays.equals(ByteUtils.intToBytes(Message.FLAG), possibleFlag)) { return startPos; } else { return findFlag(startPos + 1); } </code></pre> <p>Am I overlooking something, either in Java (Java 6 or earlier) or in a well-tested external library (such as an Apache library or similar)? If not, are there better solutions for dealing with binary data in Java or any approaches that are particularly well-suited for my problem?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload