我的系统设置是linux系统,有12个内核,隔离内核2-11。核心0和1的使用率几乎100%被其他程序使用。其余所有内核都处于空闲状态。
第一轮测试。
export GOMP_CPU_AFFINITY=2,3,4
export OMP_NUM_THREADS=3
taskset -c $GOMP_CPU_AFFINITY perf stat -d ./test_openmp
输出为:
Performance counter stats for './test_openmp':
47,654.74 msec task-clock:u # 2.981 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
115,358 page-faults:u # 2.421 K/sec
159,245,881,934 cycles:u # 3.342 GHz
250,009,309,156 instructions:u # 1.57 insn per cycle
20,002,132,172 branches:u # 419.730 M/sec
117,268 branch-misses:u # 0.00% of all branches
110,002,614,320 L1-dcache-loads:u # 2.308 G/sec
10,796,435,741 L1-dcache-load-misses:u # 9.81% of all L1-dcache accesses
0 LLC-loads:u # 0.000 /sec
0 LLC-load-misses:u # 0.00% of all LL-cache accesses
15.986638336 seconds time elapsed
47.175831000 seconds user
0.414928000 seconds sys
第二轮测试。
export GOMP_CPU_AFFINITY=1,2,3,4
export OMG_NUM_THREADS=4
taskset -c $GOMP_CPU_AFFINITY perf stat -d ./test_openmp
输出为
pid: 4118342
Performance counter stats for './test_openmp':
48,241.03 msec task-clock:u # 1.072 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
119,879 page-faults:u # 2.485 K/sec
161,605,704,451 cycles:u # 3.350 GHz
250,011,376,400 instructions:u # 1.55 insn per cycle
20,002,726,448 branches:u # 414.641 M/sec
118,657 branch-misses:u # 0.00% of all branches
110,002,938,510 L1-dcache-loads:u # 2.280 G/sec
10,796,444,713 L1-dcache-load-misses:u # 9.81% of all L1-dcache accesses
0 LLC-loads:u # 0.000 /sec
0 LLC-load-misses:u # 0.00% of all LL-cache accesses
45.012033357 seconds time elapsed
47.764469000 seconds user
0.399934000 seconds sys
我的问题是:为什么我第二次为程序分配了一个内核(内核1),但运行时间必须更长(15.98秒对45.01秒),cpu利用率非常低(2.98对1.07)
这是我运行的测试代码。
#include <iostream>
#include <cstdint>
#include <unistd.h>
constexpr int64_t N = 100000;
int m = N;
int n = N;
int main() {
double* a = new double[N];
double* c = new double[N];
double* b = new double[N*N];
std::cout << "pid: " << getpid() << std::endl;
#pragma omp parallel for default(none) shared(m,n,a,b,c)
for (int i=0; i<m; i++) {
double sum = 0.0;
for (int j=0; j<n; j++)
sum += b[i+j*N]*c[j];
a[i] = sum;
}
return 0;
}