代码之家 › 专栏 › 技术社区 › Andrew Walker

是否有一个x的浮点值,其中x-x==0为假?

ieee-754 floating-accuracy floating-point

Andrew Walker · 技术社区 · 15 年前

在大多数情况下,我理解浮点比较测试应该使用一系列值(abs(x-y)<epsilon),但自我减法是否意味着结果为零?

// can the assertion be triggered?
float x = //?;
assert( x-x == 0 )

我猜nan/inf可能是特殊情况,但我更感兴趣的是简单值的情况。

编辑:

如果有人能引用一个参考(IEEE浮点标准),我很乐意选择一个答案?

6 回复 | 直到 13 年前

Stephen Canon 15 年前

正如你所暗示的, inf - inf 是 NaN ,不等于零。同样地, NaN - NaN 是 南 . 然而,对于任何有限浮点数 x , x - x == 0.0 (取决于舍入模式 x - x 可能是负零,但负零等于 0.0 在浮点运算中)。

编辑: 给出一个清晰的标准参考有点棘手,因为这是IEEE-754标准中规定的规则的一个紧急属性。具体来说,它遵循的要求是,第5条中定义的操作必须正确四舍五入。减法就是这样一种运算(第5.4.1节“算术运算”),其结果是正确的四舍五入是适当符号的零(第6.3节第3段):

当两个操作数之和相反的符号(或两个符号相同的操作数)是正好是零,和的符号(或舍入阴性;在那下面和(或差)应为0。

所以 x-x +/- 0 0 (第5.11节第2段):

比较应忽略零的符号。

进一步编辑: 十 以至于 x - x == 0 是假的。然而,这不是你发布的代码检查的内容;它检查C风格语言中的某个表达式是否可以计算为非零值;特别是在某些平台上,通过某些(构思不周的)编译器优化,变量的两个实例 十 在这个表达式中可能有值,导致断言失败(尤其是在是一些计算的结果,而不是一个常量,可表示的值)。这是这些平台上的数字模型中的一个错误,但这并不意味着它不可能发生。

Mark B 15 年前

ypnos 15 年前

是的,除了特殊情况 x-x 将始终为0。但是 x*(1/x) 不总是1;-)

WhirlWind 15 年前

这个问题发生在对指数和尾数进行调整的比较之前的加法、减法、乘法或除法。当指数相同时,尾数被减去,如果它们相同,所有结果都是零。

http://grouper.ieee.org/groups/754/

Oleg 15 年前

不 支持IEEE 754-2008的所有处理器 (见下文参考资料)。

我对你的另一个问题的简短回答是:如果(x-y==0)和(x==y)一样安全,那么断言(x-x==0)是可以的,因为 不会产生算术下溢 在x-x或(x-y)中。

<float.h> 您可以从IEEE浮点标准中找到一些常量。有趣的是现在我们只关注

#define DBL_MIN         2.2250738585072014e-308 /* min positive value */
#define DBL_MIN_10_EXP  (-307)                  /* min decimal exponent */
#define DBL_MIN_EXP     (-1021)                 /* min binary exponent */

少于如果你用DBL\u MIN下的数字进行算术运算,这个数字将是标准化的,所以你可以像处理整数一样处理这些数字(仅限尾数运算),而不存在任何“舍入错误”。

备注 没有错误 在计算机算术运算中。这些运算只与计算机数相同(如浮点数)的+、-、*和/运算不同。有 确定性操作 关于浮点数的子集,它可以以(尾数,指数)的形式保存,每个浮点数有明确的位数。我们可以称之为 计算机浮点数 . 所以 经典浮点运算 将回到计算机浮点数集。这样的投影操作是确定的,并且具有许多特性,如if x1>=x2然后x1*y>=x2*y。

抱歉说了这么长的话,回到我们的话题上来。


#include <stdio.h>
#include <float.h>
#include <math.h>

void DumpDouble(double d)
{
    unsigned char *b = (unsigned char *)&d;
    int i;

    for (i=1; i<=sizeof(d); i++) {
        printf ("%02X", b[sizeof(d)-i]);
    }
    printf ("\n");
}

int main()
{
    double x, m, y, z;
    int exp;

    printf ("DBL_MAX=%.16e\n", DBL_MAX);
    printf ("DBL_MAX in binary form: ");
    DumpDouble(DBL_MAX);

    printf ("DBL_MIN=%.16e\n", DBL_MIN);
    printf ("DBL_MIN in binary form: ");
    DumpDouble(DBL_MIN);

    // Breaks the floating point number x into its binary significand
    // (a floating point value between 0.5(included) and 1.0(excluded))
    // and an integral exponent for 2
    x = DBL_MIN;
    m = frexp (x, &exp);
    printf ("DBL_MIN has mantissa=%.16e and exponent=%d\n", m, exp);
    printf ("mantissa of DBL_MIN in binary form: ");
    DumpDouble(m);

    // ldexp() returns the resulting floating point value from
    // multiplying x (the significand) by 2
    // raised to the power of exp (the exponent).
    x = ldexp (0.5, DBL_MIN_EXP);   // -1021
    printf ("the number (x) constructed from mantissa 0.5 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
    DumpDouble(x);

    y = ldexp (0.5000000000000001, DBL_MIN_EXP);
    m = frexp (y, &exp);
    printf ("the number (y) constructed from mantissa 0.5000000000000001 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
    DumpDouble(y);
    printf ("mantissa of this number saved as double will be displayed by printf(%%.16e) as %.16e and exponent=%d\n", m, exp);

    y = ldexp ((1 + DBL_EPSILON)/2, DBL_MIN_EXP);
    m = frexp (y, &exp);
    printf ("the number (y) constructed from mantissa (1+DBL_EPSILON)/2 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
    DumpDouble(y);
    printf ("mantissa of this number saved as double will be displayed by printf(%%.16e) as %.16e and exponent=%d\n", m, exp);

    z = y - x;
    m = frexp (z, &exp);
    printf ("z=y-x in binary form: ");
    DumpDouble(z);
    printf ("z will be displayed by printf(%%.16e) as %.16e\n", z);
    printf ("z has mantissa=%.16e and exponent=%d\n", m, exp);

    if (x == y)
        printf ("\"if (x == y)\" say x == y\n");
    else
        printf ("\"if (x == y)\" say x != y\n");

    if ((x-y) == 0)
        printf ("\"if ((x-y) == 0)\" say \"(x-y) == 0\"\n");
    else
        printf ("\"if ((x-y) == 0)\" say \"(x-y) != 0\"\n");
}

此代码产生以下输出:

DBL_MAX=1.7976931348623157e+308
DBL_MAX in binary form: 7FEFFFFFFFFFFFFF
DBL_MIN=2.2250738585072014e-308
DBL_MIN in binary form: 0010000000000000
DBL_MIN has mantissa=5.0000000000000000e-001 and exponent=-1021
mantissa of DBL_MIN in binary form: 3FE0000000000000
the number (x) constructed from mantissa 0.5 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000000
the number (y) constructed from mantissa 0.5000000000000001 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000001
mantissa of this number saved as double will be displayed by printf(%.16e) as 5.0000000000000011e-001 and exponent=-1021
the number (y) constructed from mantissa (1+DBL_EPSILON)/2 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000001
mantissa of this number saved as double will be displayed by printf(%.16e) as 5.0000000000000011e-001 and exponent=-1021
z=y-x in binary form: 0000000000000001
z will be displayed by printf(%.16e) as 4.9406564584124654e-324
z has mantissa=5.0000000000000000e-001 and exponent=-1073
"if (x == y)" say x != y
"if ((x-y) == 0)" say "(x-y) != 0"

所以我们可以看到,如果我们处理的数字小于DBL\u MIN,它们将不会被标准化(参见 0000000000000001 y=x 然后 if (x-y == 0) 是如此的安全 if (x == y) ,和 assert(x-x == 0) 工作正常。在本例中,z=0.5*2^(-1073)=1*2^(-1072)。这个数字实际上是我们能存进两倍的最小数字。所有数字小于DBL\u MIN的算术运算都类似于整数乘以2^(-1072)。

所以我有 无下溢 如果有人有另一个处理器,比较我们的结果会很有趣 .

编辑

添加的链接 http://grouper.ieee.org/groups/754/faq.html#underflow 在我的英特尔酷睿2处理器上绝对正确。它的计算方法在“+”和“-”浮点运算中。我的结果独立于Strict(/fp:Strict)或Precise(/fp:Precise)Microsoft Visual C编译器开关(请参阅 http://msdn.microsoft.com/en-us/library/e7s85ffb%28VS.80%29.aspx http://msdn.microsoft.com/en-us/library/Aa289157 )

还有一个(可能是最后一个)链接和我的最后一句话 :我找到了一个很好的参考资料 http://en.wikipedia.org/wiki/Subnormal_numbers ,这里的描述和我之前写的一样。包括非正规数或非正规化数(现在通常称为次正规数,例如在 IEEE 754-2008

从不下溢;附近有两个浮点数总是有一个可表示的非零差。减法ab可以下溢和即使值他们是不平等的。

必须在任何支持ieee754-2008的处理器上都是正确的。

michalburger1 15 年前

http://www.parashift.com/c++-faq-lite/newbie.html#faq-29.18 . (但不确定是否适用于您的情况。)