Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Though we ask the <code>AVAudioRecorder</code> to record in MPEG4-AAC format, it always produces a .caf (Core Audio Format) file. This is just a wrapper format, however, and the actual audio data it contains is in AAC format.</p> <p>In the end, appending files came down to manipulating the .caf files byte-by-byte. The spec for Core Audio Format files is <a href="http://developer.apple.com/library/mac/#documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html" rel="nofollow">here</a>. Digesting this document and processing the files accordingly was a little off-putting at first, but it turns out the spec is very clear and complete, so it wasn't too onerous.</p> <p>As the spec explains, .caf files consist of chunks with four-byte names at the beginning. For AAC files, there's always a <code>desc</code> chunk and a <code>kuki</code> chunk. As we know our two original files are in the same format, we can copy these chunks unchanged to the output file.</p> <p>There's also a <code>pakt</code> chunk and a <code>data</code> chunk. We can't guarantee which order these will be in within the input files. There may or may not be a <code>free</code> chunk - but this just contains padding 0x00's, so we needn't copy this to the output file.</p> <p>To combine the <code>pakt</code> chunks, we need to examine the chunk headers and produce a new <code>pakt</code> chunk whose <code>mNumberPackets</code> and <code>mNumberValidFrames</code> fields are the sums of those in the input files. The <code>mPrimingFrames</code> and <code>mRemainderFrames</code> are always zero - these are only relevant for streaming media. The bulk of the <code>pakt</code> chunks (ie. the actual packet table data) can just be concatenated.</p> <p>Similarly for the <code>data</code> chunks: the <code>mChunkSize</code> fields need to be summed and then the bulk of the data can be concatenated.</p> <p>Be careful when reading data from all the binary numeric fields within these files: the files are big-endian but the iPhone is little-endian.</p> <p>For extra credit, you might also like to consider deleting segments of audio from within a file, or inserting one audio file into the middle of another. This is a little trickier as you have to parse the contents of the <code>pakt</code> chunk. Again it's a case of following the spec: there's a good description of how the packet sizes are stored in variable-length integers, so you'll have to parse these to find how many bytes each packet takes up in the <code>data</code> chunk, and calculate their positions accordingly.</p> <p>All in all this is rather more hassle than I was hoping for. Maybe there's an open source library that will do all this for you, but I couldn't find one.</p> <p>However, handling raw files like this is blinding fast compared to using <code>AVMutableComposition</code> and <code>AVMutableCompositionTrack</code> as in the original question - inserting an hour-long recording into another of the same length takes about two seconds.</p> <p>Good luck!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload