Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Integers <em>are</em> binary. Or rather, integers are simply integers, and can be represented in any number base. In base 10 you'd write 13, in base 16 you'd write d (or 0xd), in binary you'd write 1101 (or 0b1101). The Romans would've written XIII. They all represent the same <em>concept</em> of the number 'thirteen'. Among humans, we tend to use the base 10 representation, but when you ask a computer to process integers, it uses the binary representation. And it doesn't make a difference. Thirteen plus fortyfive yields the same result no matter how I write it. XIII + 0x2D = 13 + 45 = 0xd + 0b101101. It doesn't matter which representation you use, the result of an arithmetic operation is the same. Which is why we allow the CPU to use the binary representation for all integer processing.</p> <p>Some programming languages also give you a "decimal" datatype, but that's generally related to floating-point arithmetics, where not all values can be represented in all bases (1/3 can be represented easily in base 3, but not in 2 or 10, for example. 1/10 can be represented in base 10, but not 2)</p> <p>However, it is surprisingly difficult to single out any particular operations as "slow", because <em>it depends</em>. A modern CPU employs a lot of tricks and optimizations to speed up most operations most of the time. So really, what you need to do to get efficient code is avoid all the special cases. And there are a lot of them, and they're generally more to do with the combination (and order) of instructions, than which instructions are used.</p> <p>Just to give you an idea of the kind of subtleties we're talking about, floating point arithmetic can be executed as fast (or sometimes faster than) integer arithmetics under ideal circumstances, but the latency is longer, meaning that ideal performance is harder to achieve. Branches which are otherwise almost free become painful because they inhibit instruction reordering and scheduling, in the compiler and on the fly on the CPU, which makes it harder to hide this latency. Latency defines how long it takes from an instruction is initiated until the result is ready; most instructions only occupy the CPU one clock cycle Even if the result isn't yet ready the following cycle, the CPU is able to start another instruction then. Which means that if the result is not immediately needed, high-latency instructions are almost free. But if you need to feed the result to the next instruction, then that'll have to wait until the result is finished.</p> <p>Certain instructions are just slow no matter what you do, and will typically stall the relevant parts of the CPU until the instruction has completed (square root is a common example, but integer division may be another. On some CPU's, doubles in general suffer from the same problem) - on the other hand, while a floating-point square root will block the FP pipeline, it won't prevent you from executing integer instructions simultaneously. </p> <p>Sometimes, storing values in variables which could be recomputed again as needed, will be faster, because they can be put in a register saving a few cycles. Other times, it'll be slower because you run out of registers, and the value would have to be pushed to the cache, or even to RAM, making recomputation on every use preferable. The order in which you traverse memory makes a huge difference. Random (scattered) accesses can take hundreds of cycles to complete, but sequential ones are almost instant. Performing your reads/writes in the right pattern allows the CPU to keep the required data in cache almost all the time, and <em>usually</em> "the right pattern" means reading data sequentially, and working on chunks of ~64kb at a time. But sometimes not. On an x86 CPU, some instructions take up 1 byte, others take 17. If your code contains a lot of the former, instruction fetch and decoding won't be a bottleneck, but if it's full of the longer instructions, that'll probably limit how many instructions can be loaded by the CPU each cycle, and then the amount it'd be able to execute doesn't matter.</p> <p>There are very few <em>universal</em> rules for performance in a modern CPU. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload