Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I'm going to take a stab in the dark here and say that you're not updating your <code>stbl</code> offsets properly. At least I didn't (at first glance) see your python doing that anywhere. </p> <h1>STSC</h1> <p>Lets start with the location of data. Packets are written into the file in terms of chunks, and the header tells the decoder where each "block" of these chunks exists. The <code>stsc</code> table says how many items per chunk exist. The <code>first chunk</code> says where that new chunk starts. It's a little confusing, but look at my example. This is saying that you have 100 samples per chunkk, up to the 8th chunk. At the 8th chunk there are 98 samples.</p> <p><img src="https://i.stack.imgur.com/R505d.png" alt="enter image description here"></p> <h1>STCO</h1> <p>That said, you also have to track where the offsets of these chunks are. That's the job of the <code>stco</code> table. So, where in the file is chunk offset 1, or chunk offset 2, etc. </p> <p><img src="https://i.stack.imgur.com/tMJCZ.png" alt="enter image description here"></p> <p>If you modify any data in <code>mdat</code> you have to maintain these tables. You can't just chop <code>mdat</code> data out, and expect the decoder to know what to do.</p> <p>As if this wasn't enough, now you have to also maintain the sample time table (<code>stts</code>) the sample size table (<code>stsz</code>) and if this was video, the sync sample table (<code>stss</code>). </p> <h1>STTS</h1> <p><code>stts</code> says how long a sample should play for in units of the timescale. If you're doing audio the timescale is probably 44100 or 48000 (kHz). </p> <p><img src="https://i.stack.imgur.com/4E2NS.png" alt="enter image description here"></p> <p>If you've lopped off some data, now everything could potentially be out of sync. If all the values here have the exact same duration though you'd be OK.</p> <h1>STSZ</h1> <p><code>stsz</code> says what size each sample is in bytes. This is important for the decoder to be able to start at a chunk, and then go through each sample by its size. </p> <p><img src="https://i.stack.imgur.com/XadVD.png" alt="enter image description here"></p> <p>Again, if all the sample sizes are exactly the same you'd be OK. Audio tends to be pretty much the same, but video stuff varies a lot (with keyframes and whatnot)</p> <h1>STSS</h1> <p>And last but not least we have the <code>stss</code> table which says which frame's are keyframes. I only have experience with AAC, but every audio frame is considered a keyframe. In that case you can have one entry that describes all the packets.</p> <p><img src="https://i.stack.imgur.com/Evjdd.png" alt="enter image description here"></p> <hr> <p>In relation to your original question, the time display isn't always honored the same way in each player. The most accurate way is to sum up the durations of all the frames in the header and use that as the total time. Other players use the metadata in the track headers. I've found it best to just keep all the values the same and then players are happy. </p> <p>If you're doing all that and I missed it in the script then can you post a sample mp4 and a standalone app and I can try to help you out.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload