代码之家  ›  专栏  ›  技术社区  ›  Kari

使用AVX的水平最大值还是最小值?[副本]

  •  0
  • Kari  · 技术社区  · 5 年前

    目前我有一个代码是 this answer that worked with double-precision :

    static inline float fast_hMax_ps(__m256 a){
        const __m256 permHalves = _mm256_permute2f128_ps(a, a, 1); // permute 128-bit values to compare floats from different halves.
        const __m256 m0 = _mm256_max_ps(permHalves, a);//compares 4 values with 4 other values ("old half against the new half")
    
        //now we need to find the largest of 4 values in the half:
        const __m256 perm0 = _mm256_permute_ps(m0, 0b01001110);
        const __m256 m1 = _mm256_max_ps(m0, perm0);
    
        const __m256 perm1 = _mm256_permute_ps(m1, 0b10110001);
        const __m256 m2 = _mm256_max_ps(perm1, m1);
        return ((float*)&m2)[0];//largest float32 from the entire vector. All entries are the same, so just grab [0]
    }
    
    0 回复  |  直到 5 年前