Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The following page may be useful in the case where you need absolute portability of such operations. It discusses software for testing implementations of the IEEE 754 standard, including software for emulating floating point operations. Most information is probably specific to C or C++, however.</p> <p><a href="http://www.math.utah.edu/~beebe/software/ieee/" rel="nofollow noreferrer">http://www.math.utah.edu/~beebe/software/ieee/</a></p> <p><strong>A note on fixed point</strong></p> <p>Binary fixed point numbers can also work well as a substitute for floating point, as is evident from the four basic arithmetic operations:</p> <ul> <li>Addition and subtraction are trivial. They work the same way as integers. Just add or subtract!</li> <li>To multiply two fixed point numbers, multiply the two numbers then shift right the defined number of fractional bits.</li> <li>To divide two fixed point numbers, shift the dividend left the defined number of fractional bits, then divide by the divisor.</li> <li>Chapter four of <a href="http://repository.lib.ncsu.edu/ir/bitstream/1840.16/1599/1/etd.pdf" rel="nofollow noreferrer">this paper</a> has additional guidance on implementing binary fixed point numbers.</li> </ul> <p>Binary fixed point numbers can be implemented on any integer data type such as int, long, and BigInteger, and the non-CLS-compliant types uint and ulong. </p> <p>As suggested in another answer, you can use lookup tables, where each element in the table is a binary fixed point number, to help implement complex functions such as sine, cosine, square root, and so on. If the lookup table is less granular than the fixed point number, it is suggested to round the input by adding one half of the granularity of the lookup table to the input:</p> <pre><code>// Assume each number has a 12 bit fractional part. (1/4096) // Each entry in the lookup table corresponds to a fixed point number // with an 8-bit fractional part (1/256) input+=(1&lt;&lt;3); // Add 2^3 for rounding purposes input&gt;&gt;=4; // Shift right by 4 (to get 8-bit fractional part) // --- clamp or restrict input here -- // Look up value. return lookupTable[input]; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload