代码之家  ›  专栏  ›  技术社区  ›  Lau Ka Fai

模运算符与位与的性能比较

  •  4
  • Lau Ka Fai  · 技术社区  · 6 年前

    我正在确定32位整数是偶数还是奇数。我有两种方法:

    模(%)法

    int r = (i % 2);
    

    按位(&)方法

    int r = (i & 0x1);
    

    这两种方法都能成功地工作。所以我每行运行15000次来测试性能。

    结果:

    模(%)法( source code )

    平均141.5801887ns sd 270.0700275ns

    按位(&)方法( source code )

    平均141.2504ns SD 193.6351007ns

    问题 :

    为什么按位(&)比除法(%)更稳定?

    JVM是否根据和(&)优化模块(%) here ?

    3 回复  |  直到 6 年前
        1
  •  10
  •   St.Antario    6 年前

    @Benchmark
    @Measurement(timeUnit = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int first() throws IOException {
        return i % 2;
    }
    
    @Benchmark
    @Measurement(timeUnit = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int second() throws IOException {
        return i & 0x1;
    }
    

    first second -prof perfnorm

    Benchmark                                 Mode  Cnt   Score    Error  Units
    MyBenchmark.first                         avgt   50   2.674 ±  0.028  ns/op
    MyBenchmark.first:CPI                     avgt   10   0.301 ±  0.002   #/op
    MyBenchmark.first:L1-dcache-load-misses   avgt   10   0.001 ±  0.001   #/op
    MyBenchmark.first:L1-dcache-loads         avgt   10  11.011 ±  0.146   #/op
    MyBenchmark.first:L1-dcache-stores        avgt   10   3.011 ±  0.034   #/op
    MyBenchmark.first:L1-icache-load-misses   avgt   10  ≈ 10⁻³            #/op
    MyBenchmark.first:LLC-load-misses         avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:LLC-loads               avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:LLC-store-misses        avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.first:LLC-stores              avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:branch-misses           avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:branches                avgt   10   4.006 ±  0.054   #/op
    MyBenchmark.first:cycles                  avgt   10   9.322 ±  0.113   #/op
    MyBenchmark.first:dTLB-load-misses        avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:dTLB-loads              avgt   10  10.939 ±  0.175   #/op
    MyBenchmark.first:dTLB-store-misses       avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.first:dTLB-stores             avgt   10   2.991 ±  0.045   #/op
    MyBenchmark.first:iTLB-load-misses        avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.first:iTLB-loads              avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.first:instructions            avgt   10  30.991 ±  0.427   #/op
    MyBenchmark.second                        avgt   50   2.263 ±  0.015  ns/op
    MyBenchmark.second:CPI                    avgt   10   0.320 ±  0.001   #/op
    MyBenchmark.second:L1-dcache-load-misses  avgt   10   0.001 ±  0.001   #/op
    MyBenchmark.second:L1-dcache-loads        avgt   10  11.045 ±  0.152   #/op
    MyBenchmark.second:L1-dcache-stores       avgt   10   3.014 ±  0.032   #/op
    MyBenchmark.second:L1-icache-load-misses  avgt   10  ≈ 10⁻³            #/op
    MyBenchmark.second:LLC-load-misses        avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.second:LLC-loads              avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.second:LLC-store-misses       avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.second:LLC-stores             avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.second:branch-misses          avgt   10  ≈ 10⁻⁴            #/op
    MyBenchmark.second:branches               avgt   10   4.014 ±  0.045   #/op
    MyBenchmark.second:cycles                 avgt   10   8.024 ±  0.098   #/op
    MyBenchmark.second:dTLB-load-misses       avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.second:dTLB-loads             avgt   10  10.989 ±  0.161   #/op
    MyBenchmark.second:dTLB-store-misses      avgt   10  ≈ 10⁻⁶            #/op
    MyBenchmark.second:dTLB-stores            avgt   10   3.004 ±  0.042   #/op
    MyBenchmark.second:iTLB-load-misses       avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.second:iTLB-loads             avgt   10  ≈ 10⁻⁵            #/op
    MyBenchmark.second:instructions           avgt   10  25.076 ±  0.296   #/op
    

    0x00007f91111f8355: mov     0xc(%r10),%r11d   ;*getfield i
    0x00007f91111f8359: mov     %r11d,%edx
    0x00007f91111f835c: and     $0x1,%edx
    0x00007f91111f835f: mov     %edx,%r10d
    0x00007f6bd120a6e2: neg     %r10d
    0x00007f6bd120a6e5: test    %r11d,%r11d
    0x00007f6bd120a6e8: cmovl   %r10d,%edx       
    

    0x00007ff36cbda580: mov     $0x1,%edx
    0x00007ff36cbda585: mov     0x40(%rsp),%r10
    0x00007ff36cbda58a: and     0xc(%r10),%edx  
    
        2
  •  0
  •   steffen    6 年前

    irem     // int remainder (%)
    iand     // bitwise and (&)
    

    irem iand Oracle

        3
  •  -1
  •   meriton    6 年前

    How do I write a correct micro-benchmark in Java?

    public abstract class Benchmark {
    
        final String name;
    
        public Benchmark(String name) {
            this.name = name;
        }
    
        @Override
        public String toString() {
            return name + "\t" + time() + " ns / iteration";
        }
    
        private BigDecimal time() {
            try {
                // automatically detect a reasonable iteration count (and trigger just in time compilation of the code under test)
                int iterations;
                long duration = 0;
                for (iterations = 1; iterations < 1_000_000_000 && duration < 1_000_000_000; iterations *= 2) {
                    long start = System.nanoTime();
                    run(iterations);
                    duration = System.nanoTime() - start;
                    cleanup();
                }
                return new BigDecimal((duration) * 1000 / iterations).movePointLeft(3);
            } catch (Throwable e) {
                throw new RuntimeException(e);
            }
        }
    
        /**
         * Executes the code under test.
         * @param iterations
         *            number of iterations to perform
         * @return any value that requires the entire code to be executed (to
         *         prevent dead code elimination by the just in time compiler)
         * @throws Throwable
         *             if the test could not complete successfully
         */
        protected abstract Object run(int iterations) throws Throwable;
    
        /**
         * Cleans up after a run, setting the stage for the next.
         */
        protected void cleanup() {
            // do nothing
        }
    
        public static void main(String[] args) throws Exception {
            System.out.println(new Benchmark("%") {
                @Override
                protected Object run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int i = 0; i < iterations; i++) {
                        sum += i % 2;
                    }
                    return sum; 
                }
            });
            System.out.println(new Benchmark("&") {
                @Override
                protected Object run(int iterations) throws Throwable {
                    int sum = 0;
                    for (int i = 0; i < iterations; i++) {
                        sum += i & 1;
                    }
                    return sum;
                }
            });
        }
    }
    

    %   0.375 ns / iteration
    &   0.139 ns / iteration
    

    & 1