代码之家 › 专栏 › 技术社区 › Kari

使用AVX的水平最大值还是最小值?[副本]

avx intrinsics c++

0

Kari · 技术社区 · 5 年前

目前我有一个代码是 this answer that worked with double-precision :

static inline float fast_hMax_ps(__m256 a){
    const __m256 permHalves = _mm256_permute2f128_ps(a, a, 1); // permute 128-bit values to compare floats from different halves.
    const __m256 m0 = _mm256_max_ps(permHalves, a);//compares 4 values with 4 other values ("old half against the new half")

    //now we need to find the largest of 4 values in the half:
    const __m256 perm0 = _mm256_permute_ps(m0, 0b01001110);
    const __m256 m1 = _mm256_max_ps(m0, perm0);

    const __m256 perm1 = _mm256_permute_ps(m1, 0b10110001);
    const __m256 m2 = _mm256_max_ps(perm1, m1);
    return ((float*)&m2)[0];//largest float32 from the entire vector. All entries are the same, so just grab [0]
}

0 回复 | 直到 5 年前

推荐文章

jww avp · 如何在C++03中交换两个__m128i变量,使其具有不透明类型和数组?

8 年前

User1291 · intel intrinsic-加载/存储的函数指针

9 年前

xelp · GCC中的flag-ffixed-<reg>是否总是被窃听?

9 年前

user1235183 · 通过函数指针使用内部函数时的链接器错误

9 年前

jiandingzhe · 这些128位SIMD异或运算[重复]之间的区别是什么

10 年前

Simon · 临时/“不可寻址”固定大小阵列?

10 年前