Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I know this is an old thread, but I feel the answer isn't complete.</p> <p>What you first have to understand is that TSO is the tip of a fairly big iceberg when it comes to network performance boosting techniques.</p> <p>Let's consider the basic network interface. Your OS sends a whole packet to the NIC (network interface card) using PIO (Programmed input/output i.e. one word (normally 32 bits) at a time) as it should appear on the wire only excluding the frame check sequence.</p> <p>These are the speed boosts for the transmission of data.</p> <p>So the first speed boost is to use DMA (Direct Memory Access), this allows the processor to do other things while the hardware copies the packet. But the OS still has to copy the packet data into memory and generate the headers and checksums.</p> <p>The second boost is to have the hardware generate the checksum for the data portion of the packet, the OS will still copy the data into its memory space and place the header before it. As the OS is generating the headers it may as well always generate the checksums for the headers. This seems complicated, but the mechanism is actually quite simple. The hardware is told to start checksumming when it reaches position XX and to place the checksum at position yy in the packet buffer.</p> <p>The third boost is to use Scatter/Gather. This basically means the OS doesn't copy the data into its memory, it passes the header and the location of the data portion to the driver and allows the driver to collect the data to send it. This requires hardware checksumming, if the OS needs to checksum the packet then it needs to copy it into memory first.</p> <p>The fourth (and highest level of natively supported boosting in Linux) is TSO. With TSO the OS gives the hardware a header template and then a large chunk of data (no more then 64K) for it to split and checksum, the means the OS needs to generate fewer headers and any overhead in setting up the DMA is also decimated. When the packets go on the wire they are compliant to the normal rules of packets and will be compatible with <strong>ANY</strong> switch or router they transit through.</p> <p>Reception is a different story. Hardware checksumming is more of a guess than a certainty here, so what <strong>SHOULD</strong> happen is the hardware passes the packet and the checksum to the OS separately and allows the OS to decide if the packet is OK or not.</p> <p>Scatter/Gather is pretty much redundant for receive.</p> <p>LRO (Large receive offload), well, there's no easy way for the hardware to know what these packets mean, so LRO is currently a software only construct, the packets are passed to the OS, the OS then decides whether or not to concatenate the data and pass a large chunk to the application or to pass many smaller chunks.</p> <p>A few notes on the network stack.</p> <p>The software should <strong>ALWAYS</strong> produce the ACK packets. The only reason it wouldn't is if you had a TOE (TCP Offload Engine) on your NIC. I don't know of any OS which natively supports this, which means you'd need to hack it to make it compatible.</p> <p>So there's a full and rambling response, hope it helps someone.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload