代码之家  ›  专栏  ›  技术社区  ›  youpilat13 Ty Petrice

MPI与顺序代码-自由阵列问题

  •  0
  • youpilat13 Ty Petrice  · 技术社区  · 8 年前

    在计算网格上的值的小代码的顺序版本和MPI版本之间,我有一个奇怪的结果。

    int main() {
    
       /* Array */
       double **x;
     
       /* Allocation of 2D arrays */
       x = malloc(size_tot_y*sizeof(*x));
    
       for (i=0;i<=size_tot_y-1;i++) {
          x[i] = malloc(size_tot_x*sizeof(**x));
       }
    
       /* Do various computations */
    
       /* End of code */
    
       /* Free all arrays */
       for (i=0;i<=size_tot_y-1;i++) {
          free(x[i]);
       }
       free(x);
    
       return 0;
    
    }
    

    这个顺序版本工作正常,所有阵列( x , x0

     int main() {
        
       /* Array */
       double **x;
       double *xfinal;
        
       /* Allocate size_tot_y rows */
       x = malloc(size_tot_y*sizeof(*x));
    
       /* Allocate 2D Contiguous arrays for x */
       x[0] = malloc(size_tot_x*size_tot_y*sizeof(**x));
    
       /* Loop on rows */
       for (j=1;j<size_tot_y;j++) {
        /* Increment size_tot_y block on x[i] and x0[i] address */
        x[j] = x[0] + j*size_tot_x;
       }
    
           /* Do various computations */
        
           /* End of MPI code */
        
       /* Free all arrays */
       for (i=0;i<=size_tot_y-1;i++) {
          free(x[i]);
       }
       free(x);
    
       return 0;
    
       }
    

    执行时出现以下错误:

    [machine1:04130] *** Process received signal ***
    [machine1:04130] Signal: Segmentation fault (11)
    [machine1:04130] Signal code: Address not mapped (1)
    [machine1:04130] Failing at address: 0x7f179c020838
    [machine1:04131] *** Process received signal ***
    [machine1:04131] Signal: Segmentation fault (11)
    [machine1:04131] Signal code: Address not mapped (1)
    [machine1:04131] Failing at address: 0x7ff0b417c838
    [machine1:04132] *** Process received signal ***
    [machine1:04132] Signal: Segmentation fault (11)
    [machine1:04132] Signal code: Address not mapped (1)
    [machine1:04132] Failing at address: 0x7f8560001838
    [machine1:04133] *** Process received signal ***
    [machine1:04133] Signal: Segmentation fault (11)
    [machine1:04133] Signal code: Address not mapped (1)
    [machine1:04133] Failing at address: 0x7f22f415f838
    [machine1:04134] *** Process received signal ***
    [machine1:04140] *** Process received signal ***
       
              [machine1:04134] Signal: Segmentation fault (11)
              [machine1:04134] Signal code: Address not mapped (1)
              [machine1:04134] Failing at address: 0x7f4e3c0d3838
              [machine1:04142] *** Process received signal ***
              [machine1:04142] Signal: Segmentation fault (11)
              [machine1:04142] Signal code: Address not mapped (1)
              [machine1:04142] Failing at address: 0x7ff0d4064838
              [machine1:04140] Signal: Segmentation fault (11)
              [machine1:04140] Signal code: Address not mapped (1)
              [machine1:04140] Failing at address: 0x7fb2941c3838
              [machine1:04129] *** Process received signal ***
              [machine1:04129] Signal: Segmentation fault (11)
              [machine1:04129] Signal code: Address not mapped (1)
              [machine1:04129] Failing at address: 0x7f9150049838
              [machine1:04142] [machine1:04134] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f4e48e55890]
              [machine1:04134] [machine1:04129] [ 0] [machine1:04130] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x[machine1:04131] [ 0] [machine1:04132] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0([machine1:04140] [ 1] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f91550a8890]
              [machine1:04129] [ 1] f890)[0x7f179f424890]
              [machine1:04130] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0b777e890]
              [machine1:04131] [ 1] [machine1:04133] [ 0] +0xf890)[0x7f8564847890]
              [machine1:04132] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f4e48b17614]
              [machine1:04134] (+0xf890)[0x7fb2979c7890]
              /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f179f0e6614]
              [machine1:04130] [ 2] ./explicitPar[0x401c48]
              /lib/x86_64-linux-gnu/libpthread.so.0[ 2] ./explicitPar[0x401c48]
              [machine1:04134] [ 3] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f8564509614]
              [machine1:04132] (+0xf890/lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f9154d6a614]
              [machine1:04129] [machine1:04140] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7ff0b7440614]
              [machine1:04131] [machine1:04130] [ 3] /lib/x86_64-linux-gnu/libc.so.6(/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x[ 2] ./explicitPar[0x401c48]
              [machine1:04132] [ 3] [ 2] ./explicitPar[0x401c48]
              [machine1:04129] [ 3] [ 2] ./explicitPar[0x401c48]
              [machine1:04131] [ 3] __libc_start_main+0xf5)[0x7f179f08bb45]
              [machine1:04130] [ 4] ./explicitPar[0x400e49]
              [machine1:04130] *** End of error message ***
              f5)[0x7f4e48abcb45]
              [machine1:04134] )[0x7f22f8bb2890]
              [machine1:04133] /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ff0b73e5b45[ 4] ./explicitPar[0x400e49]
              /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f9154d0fb45]
              [machine1:04129] ]
              [machine1:04131] [ 4] ./explicitPar[0x7f85644aeb45]
              [machine1:04132] /lib/x86_64-linux-gnu/libc.so.6(cfree[ 0] [ 4] ./explicitPar[0x400e49]
              [machine1:04129] *** End of error message ***
              (cfree+0x14)[0x7fb297689614]
              [machine1:04140] [ 2] ./explicitPar[0x401c48[machine1:04134] *** End of error message ***
              [0x400e49]
              [machine1:04131] *** End of error message ***
              [ 4] ./explicitPar[0x400e49]
              [machine1:04132] *** End of error message ***
              +0x14)[0x7f22f8874614]
              [machine1:04133] ]
              [machine1:04140] [ 3] [ 2] ./explicitPar/lib/x86_64-linux-gnu/libc.so.6[0x401c48]
              [machine1:04133] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb29762eb45]
              [machine1:04140] [ 4] (__libc_start_main+0xf5)[0x./explicitPar[0x7f22f8819b45]
              [machine1:04133] 400e49]
              [machine1:04140] *** End of error message ***
              [ 4] ./explicitPar[0x400e49]
              [machine1:04133] *** End of error message ***
              /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0d9907890]
              [machine1:04142] [ 1] --------------------------------------------------------------------------
              mpirun noticed that process rank 1 with PID 0 on node machine1 exited on signal 11 (Segmentation fault).
    

    如果我只是释放阵列:

       free(x);
       
    

    i、 e,我在这里评论了这一部分:

    /*for (i=0;i<=size_tot_y-1;i++) {
          free(x[i]);      
       }
     */
    

    然后,我没有遇到像上面这样的错误:所以问题来自在MPI代码版本中释放数组的方法。

    1 回复  |  直到 4 年前
        1
  •  0
  •   Gilles Gouaillardet    8 年前

    阵列分配和解分配必须对称。

    您确实将2D数组声明为 double ** ,所以这些实际上是指针数组,指向 double 在顺序版本中,您发布了一个 malloc() 对于列,然后是一个 一行您的行不会在连续内存中,但这很好。

    这种方法通常对MPI无效,因为您可能会将2D数组传递给某些需要连续数据布局的MPI函数。 malloc() 对于列(到目前为止没有任何更改),然后是一个 仅有一个的 malloc() 全部的 行。然后构造了第一个分配的数组,其中包含指向第二个数组的指针。 free()

    因此,正确的方法是取消分配 x

    free(x[0]);
    free(x);
    
    推荐文章