Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I wrote the program below in order to obtain some empirical results for single-precision.</p> <pre><code>#include &lt;float.h&gt; #include &lt;math.h&gt; #include &lt;stdio.h&gt; long double d1, d2, rel1, rel2; float i1, i2; int main() { float f; for (f = nextafterf(0, 2); f &lt;= 1; f = nextafterf(f, 2)) { long double o = 1.0L - ((long double)f * f); float r1 = (1 - f) * (1 + f); float r2 = 1 - f * f; long double c1 = fabsl(o - r1); long double c2 = fabsl(o - r2); if (c1 &gt; d1) d1 = c1; if (c2 &gt; d2) d2 = c2; if (c1 / o &gt; rel1) rel1 = c1 / o, i1 = f; if (c2 / o &gt; rel2) rel2 = c2 / o, i2 = f; } printf("(1-x)(1+x) abs:%Le relative:%Le\n", d1, rel1); printf("1-x*x abs:%Le relative:%Le\n\n", d2, rel2); printf("input1: %a 1-x:%a 1+x:%a (1-x)(1+x):%a o:%a\n", i1, 1-i1, 1+i1, (1-i1)*(1+i1), (double)(1 - ((long double)i1 * i1))); printf("input2: %a x*x:%a 1-x*x:%a o:%a\n", i2, i2*i2, 1 - i2*i2, (double)(1 - ((long double)i2 * i2))); } </code></pre> <p>A few remarks:</p> <ul> <li>I used 80-bit <code>long double</code> to compute meta-data. This is not enough to precisely represent the error made for all values of <code>x</code>, but I fear the program would become prohibitively slow with higher precision. </li> <li>The reference value <code>o</code> is computed as <code>1.0L - ((long double)f * f)</code>. This is always the nearest <code>long double</code> number to the real result, because <code>(long double)f * f</code> is exact (see, already it seems that the form <code>1 - x*x</code> can sometimes be better :) ).</li> </ul> <p>I obtained the results below:</p> <pre><code>(1-x)(1+x) abs:8.940394e-08 relative:9.447410e-08 1-x*x abs:4.470348e-08 relative:8.631498e-05 input1: 0x1.6a046ep-1 1-x:0x1.2bf724p-2 1+x:0x1.b50238p+0 (1-x)(1+x):0x1.0007bep-1 o:0x1.0007bc6a305ep-1 input2: 0x1.ffe96p-1 x*x:0x1.ffd2cp-1 1-x*x:0x1.6ap-12 o:0x1.69f8007p-12 </code></pre> <p>According to these results, <code>1 - x*x</code> has better absolute accuracy and <code>(1-x)*(1+x)</code> has <strong>much</strong> better relative accuracy. Floating-point is all about relative accuracy (the entire system is designed to allow relatively accurate representation of small and large values), so the latter form is preferred.</p> <p>EDIT: Computing the final error makes more sense, as illustrated in Eric's answer. A subexpression in an expression such as <code>ArcTan(X, Sqrt(1 - X*X))</code> could have been chosen not because of its better accuracy overall but because it was accurate where it mattered most. Adding the lines below to the body of the loop:</p> <pre><code> long double a = atan2l(f, sqrtl(o)); float a1 = atan2f(f, sqrtf(r1)); float a2 = atan2f(f, sqrtf(r2)); long double e1 = fabsl(a - a1); long double e2 = fabsl(a - a2); if (e1 / a &gt; ae1) ae1 = e1 / a, i1 = f; if (e2 / a &gt; ae2) ae2 = e2 / a, i2 = f; </code></pre> <p>It might make as much sense to use <code>atan2l(f, sqrtf(r1))</code> because I do not have the exact same function <code>ArcTan</code> as your system. Anyway, with these caveats, for the complete expression, the maximum relative error on the interval [-1 … 1] is 1.4e-07 for the (1-x)(1+x) version and 5.5e-7 for the 1-x<sup>2</sup> version.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload