代码之家  ›  专栏  ›  技术社区  ›  njuffa

等距角正余弦的快速精确迭代生成

  •  3
  • njuffa  · 技术社区  · 6 年前

    在某些应用中,需要多个角度的正弦和余弦,其中的角度是通过等尺寸增量的重复相加得到的 英克思 达到起始值 基础 . 出于性能原因,而不是调用 sin() , cos() 标准数学库函数(或者可能是非标准的 sincos() 函数)对于每个生成的角度,可以非常方便地计算 罪恶(基础) 余弦(碱) 只有一次,然后通过应用 angle-sum formulas :

    sin(基+增量)=cos(增量)·sin(基)+sin(增量)·cos(基)
    cos(base+incr)=cos(incr)·cos(base)-sin(incr)·sin(base)

    这只需要对比例因子进行一次预计算 罪(罪) COS(IPCR) ,无论执行了多少次迭代。

    这种方法有几个问题。如果增量很小, COS(IPCR) 将是一个接近单位的数字,在有限精度浮点格式计算时,通过隐式减法对消造成精度损失。此外,由于计算是 以数字优势形式排列 sin(基+增量)=sin(基)+调整 ,其中计算出的数量 调整 在数量上明显小于 罪恶(基础) (类似于余弦)。

    由于一个通常应用数十到数百个迭代步骤,这些错误将累积。如何以最有利于保持高精度的方式构造迭代计算?如果 fused multiply-add 操作(fma)可用,通过标准数学函数公开 fma() fmaf() ?

    3 回复  |  直到 6 年前
        1
  •  3
  •   njuffa    6 年前

    应用 half-angle formula 因为正弦允许解决问题中提到的影响精度的两个问题:

    sin(incr/2)=_((1-cos(incr))/2)_
    sin(incr/2)=(1-cos(incr))/2
    2·sin(incr/2)=1-cos(incr)
    1-2·sin(incr/2)=cos(incr)

    将其替换为原始公式将导致此中间表示:

    sin(基+incr)=(1-2··sin·(incr/2)··sin(基)+sin(incr)·cos(基)
    cos(基+incr)=(1-2·sin(incr/2))·cos(基)-sin(incr)·sin(基)

    通过对术语的简单重新排序,可以得到所需的公式形式:

    sin(基+incr)=sin(基)+(sin(incr)·cos(基)-2·sin(incr/2)·sin(基)
    cos(base+incr)=cos(base)—(2··sin·(incr/2)·cos(base)+sin(incr)·sin(base))

    与原始公式一样,这只需要对两个比例因子进行一次预计算,即 2··sin(incr/2) 罪(罪) . 对于小的增量,两者都很小:保持完全的精度。

    对于如何将fma应用于此计算,有两种选择。可以通过取消使用一次调整的方法来最小化操作次数,而使用两次,希望fma操作(一次四舍五入两次操作)减少的舍入误差能够补偿精度损失:

    sin(base+incr)=fma(-2··sin(incr/2),sin(base),fma(sin(incr),cos(base),sin(base)))
    cos(base+incr)=fma(-2·sin(incr/2),cos(base),fma(-sin(incr),sin(base),cos(base)))

    另一种方法是将一个fma应用于改进的公式,尽管目前尚不清楚这两个乘法中的哪一个应映射到fma中的无边界乘法:

    sin(基+incr)=sin(基)+fma(sin(incr),cos(基),-2·sin(incr/2)·sin(基)
    cos(base+incr)=cos(base)-fma(sin(incr),sin(base),2·sin(incr/2)·cos(base)

    下面的脚手架通过生成许多( 基础 , 英克思 )然后,在收集生成的所有正弦和余弦值的错误时,为每个值迭代一组步骤。由此计算出 root-mean square error 对于每个测试用例,分别为正弦和余弦。最后报告了在所有生成的测试用例中观察到的最大rms误差。

    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    
    #define NAIVE    (1)
    #define ROBUST   (2)
    #define FAST     (3)
    #define ACCURATE (4)
    #define MODE (ACCURATE)
    
    // Fixes via: Greg Rose, KISS: A Bit Too Simple. http://eprint.iacr.org/2011/007
    static unsigned int z=362436069,w=521288629,jsr=362436069,jcong=123456789;
    #define znew (z=36969*(z&0xffff)+(z>>16))
    #define wnew (w=18000*(w&0xffff)+(w>>16))
    #define MWC  ((znew<<16)+wnew)
    #define SHR3 (jsr^=(jsr<<13),jsr^=(jsr>>17),jsr^=(jsr<<5)) /* 2^32-1 */
    #define CONG (jcong=69069*jcong+13579)                     /* 2^32 */
    #define KISS ((MWC^CONG)+SHR3)
    
    int main (void)
    {
        double sumerrsqS, sumerrsqC, rmsS, rmsC, maxrmsS = 0, maxrmsC = 0;
        double refS, refC, errS, errC;
        float base, incr, s0, c0, s1, c1, tt;
        int count, i;
    
        const int N = 100;  // # rotation steps per test case
        count = 2000000;    // # test cases (a pair of base and increment values)
    
    #if MODE == NAIVE
        printf ("testing: NAIVE (without FMA)\n");
    #elif MODE == FAST
        printf ("testing: FAST (without FMA)\n");
    #elif MODE == ACCURATE
        printf ("testing: ACCURATE (with FMA)\n");
    #elif MODE == ROBUST
        printf ("testing: ROBUST (with FMA)\n");
    #else
    #error unsupported MODE
    #endif // MODE
    
        do {
            /* generate test case */
            base = (float)(KISS * 1.21e-10);      // starting angle, < 30 degrees
            incr = (float)(KISS * 2.43e-10 / N);  // increment, < 60/n degrees
    
            /* set up rotation parameters */
            s1 = sinf (incr);
    #if MODE == NAIVE
            c1 = cosf (incr);
    #else
            tt = sinf (incr * 0.5f);
            c1 = 2.0f * tt * tt;
    #endif // MODE
            sumerrsqS = 0;
            sumerrsqC = 0;
    
            s0 = sinf (base); // initial sine
            c0 = cosf (base); // initial cosine
    
            /* run test case through N rotation steps */
            i = 0;
            do {         
    
                tt = s0; // old sine
    #if MODE == NAIVE
                /* least accurate, 6 FP ops */
                s0 = c1 * tt + s1 * c0; // new sine
                c0 = c1 * c0 - s1 * tt; // new cosine
    #elif MODE == ROBUST
                /* very accurate, 8 FP ops */
                s0 = ( s1 * c0 - c1 * tt) + tt; // new sine
                c0 = (-s1 * tt - c1 * c0) + c0; // new cosine
    #elif MODE == FAST
                /* accurate and fast, 4 FP ops */
                s0 = fmaf (-c1, tt, fmaf ( s1, c0, tt)); // new sine
                c0 = fmaf (-c1, c0, fmaf (-s1, tt, c0)); // new cosine
    #elif MODE == ACCURATE
                /* most accurate, 6 FP ops */
                s0 = tt + fmaf (s1, c0, -c1 * tt); // new sine
                c0 = c0 - fmaf (s1, tt,  c1 * c0); // new cosine
    #endif // MODE
                i++;
    
                refS = sin (fma ((double)i, (double)incr, (double)base));
                refC = cos (fma ((double)i, (double)incr, (double)base));
                errS = ((double)s0 - refS) / refS;
                errC = ((double)c0 - refC) / refC;
                sumerrsqS = fma (errS, errS, sumerrsqS);
                sumerrsqC = fma (errC, errC, sumerrsqC);
            } while (i < N);
    
            rmsS = sqrt (sumerrsqS / N);
            rmsC = sqrt (sumerrsqC / N);
            if (rmsS > maxrmsS) maxrmsS = rmsS;
            if (rmsC > maxrmsC) maxrmsC = rmsC;
    
        } while (--count);
    
        printf ("max rms error sin = % 16.9e\n", maxrmsS);
        printf ("max rms error cos = % 16.9e\n", maxrmsC);
    
        return EXIT_SUCCESS;
    }
    

    测试脚手架的输出表明,最快的基于fma的备选方案优于问题中的朴素方法,而更准确的基于fma的备选方案是所考虑的备选方案中最准确的:

    testing: NAIVE (without FMA)
    max rms error sin =  4.837386842e-006
    max rms error cos =  6.884047862e-006
    
    testing: ROBUST (without FMA)
    max rms error sin =  3.330292645e-006
    max rms error cos =  4.297631502e-006
    
    testing: FAST (with FMA)
    max rms error sin =  3.532624939e-006
    max rms error cos =  4.763623188e-006
    
    testing: ACCURATE (with FMA)
    max rms error sin =  3.330292645e-006
    max rms error cos =  4.104813533e-006
    
        2
  •  1
  •   Matt Timmermans    6 年前

    如果您希望在长时间的迭代计数中获得最大的精度,则可以在从以前的精确值生成增量结果时,以增量方式计算精确值,而不产生累积错误。

    例如,如果预先计算 sin(增量*2^x) cos(增量*2^x) 对于 x=6…三十一 ,例如,然后可以使用角度和公式计算每个 iNCR=64×N 输出前64个值时,一次一位。

    每64个值都会丢弃增量生成的结果,而使用正确的结果,因此错误不会在长时间内累积。

    另外,由于您只需要从任何精确的基中获得64个增量结果,因此可以预先计算直接从基中计算这些结果所需的64个正弦和余弦,而不是以前的结果。

        3
  •  0
  •   wim    6 年前

    可以按以下方式重新排列sin(base+incr)和cos(base+incr)的方程:

    sin(基+增量)=cos(增量)·sin(基)+sin(增量)·cos(基)
    sin(基+incr)=sin(基)+(1-cos(incr))·-sin(基)+sin(incr)·cos(基)
    sin(基+incr)=sin(基)+sin(incr)·(-1/sin(incr)·(1-cos(incr))·sin(基)+cos(基)
    sin(基+增量)=sin(基)+sin(增量)·(-tan(增量/2)·sin(基)+cos(基))

    cos(基+增量)=cos(增量)·cos(基)-sin(增量)·sin(基)
    cos(base+incr)=cos(base)-sin(incr)·(tan(incr/2)·cos(base)+sin(base))

    这里我们用公式 (1-cos(x)/sin(x)=tan(x/2) , 看见 here ,例如。 现在还不明显的是,这会导致比其他方法更准确的结果, 但在实践中,它非常有效,我们稍后将看到这一点。

    同样,这需要两个比例因子的一次性预计算 罪(罪) 谭(英克思/ 2) . 在c语言中,我们可以用4个fma-s编写公式:

            s0 = fmaf ( s1, fmaf (-tt, c1, c0), tt); // new sine
            c0 = fmaf (-s1, fmaf ( c0, c1, tt), c0); // new cosine
    

    完整更新的测试代码在这个答案的末尾。 用 gcc -O3 -Wall -m64 -march=skylake fastsincos.c -lm (GCC版本7.3),结果如下:

    testing: FAST (with FMA)
    max rms error sin =  3.532624939e-06
    max rms error cos =  4.763623188e-06
    
    testing: ACCURATE (with FMA)
    max rms error sin =  3.330292645e-06
    max rms error cos =  4.104813533e-06
    
    testing: FAST_ACC (with FMA)
    max rms error sin =  3.330292645e-06
    max rms error cos =  3.775300478e-06
    

    新的解决方案 FAST_ACC 在这个测试中确实比其他的要准确一点。


    修改测试代码:

    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    
    #define NAIVE    (1)
    #define ROBUST   (2)
    #define FAST     (3)
    #define ACCURATE (4)
    #define FAST_ACC (5)
    #define MODE (FAST_ACC)
    
    // Fixes via: Greg Rose, KISS: A Bit Too Simple. http://eprint.iacr.org/2011/007
    static unsigned int z=362436069,w=521288629,jsr=362436069,jcong=123456789;
    #define znew (z=36969*(z&0xffff)+(z>>16))
    #define wnew (w=18000*(w&0xffff)+(w>>16))
    #define MWC  ((znew<<16)+wnew)
    #define SHR3 (jsr^=(jsr<<13),jsr^=(jsr>>17),jsr^=(jsr<<5)) /* 2^32-1 */
    #define CONG (jcong=69069*jcong+13579)                     /* 2^32 */
    #define KISS ((MWC^CONG)+SHR3)
    
    int main (void)
    {
        double sumerrsqS, sumerrsqC, rmsS, rmsC, maxrmsS = 0, maxrmsC = 0;
        double refS, refC, errS, errC;
        float base, incr, s0, c0, s1, c1, tt;
        int count, i;
    
        const int N = 100;  // # rotation steps per test case
        count = 2000000;    // # test cases (a pair of base and increment values)
    
    #if MODE == NAIVE
        printf ("testing: NAIVE (without FMA)\n");
    #elif MODE == FAST
        printf ("testing: FAST (without FMA)\n");
    #elif MODE == ACCURATE
        printf ("testing: ACCURATE (with FMA)\n");
    #elif MODE == ROBUST
        printf ("testing: ROBUST (with FMA)\n");
    #elif MODE == FAST_ACC
        printf ("testing: FAST_ACC (with FMA)\n");
    #else
    #error unsupported MODE
    #endif // MODE
    
        do {
            /* generate test case */
            base = (float)(KISS * 1.21e-10);      // starting angle, < 30 degrees
            incr = (float)(KISS * 2.43e-10 / N);  // increment, < 60/n degrees
    
            /* set up rotation parameters */
            s1 = sinf (incr);
    #if MODE == NAIVE
            c1 = cosf (incr);
    #elif MODE == FAST_ACC
            c1 = tanf (incr * 0.5f);
    #else
            tt = sinf (incr * 0.5f);
            c1 = 2.0f * tt * tt;
    #endif // MODE
            sumerrsqS = 0;
            sumerrsqC = 0;
    
            s0 = sinf (base); // initial sine
            c0 = cosf (base); // initial cosine
    
            /* run test case through N rotation steps */
            i = 0;
            do {         
    
                tt = s0; // old sine
    #if MODE == NAIVE
                /* least accurate, 6 FP ops */
                s0 = c1 * tt + s1 * c0; // new sine
                c0 = c1 * c0 - s1 * tt; // new cosine
    #elif MODE == ROBUST
                /* very accurate, 8 FP ops */
                s0 = ( s1 * c0 - c1 * tt) + tt; // new sine
                c0 = (-s1 * tt - c1 * c0) + c0; // new cosine
    #elif MODE == FAST
                /* accurate and fast, 4 FP ops */
                s0 = fmaf (-c1, tt, fmaf ( s1, c0, tt)); // new sine
                c0 = fmaf (-c1, c0, fmaf (-s1, tt, c0)); // new cosine
    #elif MODE == ACCURATE
                /* most accurate, 6 FP ops */
                s0 = tt + fmaf (s1, c0, -c1 * tt); // new sine
                c0 = c0 - fmaf (s1, tt,  c1 * c0); // new cosine
    #elif MODE == FAST_ACC
                /* fast and accurate, 4 FP ops */
                s0 = fmaf ( s1, fmaf (-tt, c1, c0), tt); // new sine
                c0 = fmaf (-s1, fmaf ( c0, c1, tt), c0); // new cosine
    #endif // MODE
                i++;
    
                refS = sin (fma ((double)i, (double)incr, (double)base));
                refC = cos (fma ((double)i, (double)incr, (double)base));
                errS = ((double)s0 - refS) / refS;
                errC = ((double)c0 - refC) / refC;
                sumerrsqS = fma (errS, errS, sumerrsqS);
                sumerrsqC = fma (errC, errC, sumerrsqC);
            } while (i < N);
    
            rmsS = sqrt (sumerrsqS / N);
            rmsC = sqrt (sumerrsqC / N);
            if (rmsS > maxrmsS) maxrmsS = rmsS;
            if (rmsC > maxrmsC) maxrmsC = rmsC;
    
        } while (--count);
    
        printf ("max rms error sin = % 16.9e\n", maxrmsS);
        printf ("max rms error cos = % 16.9e\n", maxrmsC);
    
        return EXIT_SUCCESS;
    }