代码之家  ›  专栏  ›  技术社区  ›  user3116936

当X Server打开时,大型阵列的内核超时

  •  0
  • user3116936  · 技术社区  · 9 年前

    我正在启动内核并检查可能的错误,如下所示:

    kernel<<<grid,block>>>(d_Basis, d_repul_aux,nao);
      cout<<"done with the ERIs...."<<endl;
      std::string error = cudaGetErrorString(cudaPeekAtLastError());
      cout<<error<<endl;
    
    HANDLE_ERROR(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repulsion_aux,eris_size*sizeof(double),cudaMemcpyDeviceToHost)); 
    

    其中cudaGetErrorString(cudaPeekAtLastError())用于对内核进行错误检查,我定义了:

    static void HandleError( cudaError_t err,
                             const char *file,
                             int line ) {
      if (err != cudaSuccess) {
        printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
                file, line );
        exit( EXIT_FAILURE );
      }
    }
    
    #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
    

    当X服务器关闭时,计算按预期运行;但如果我打开X服务器,内核会挂起,我会得到以下输出:

    done with the ERIs....
    no error
    the launch timed out and was terminated in main.cu at line 1038
    

    源代码中的第1038行对应于:

    句柄错误(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repusion_aux,eris_size*sizeof(double),cudaMemcpyDeviceToHost));

    这意味着当我们将结果从设备复制到主机时,计算会崩溃。我使用的是图形卡GEforce GTx-480和CUDA 7.5。

    为了解决这个问题,我尝试关闭/etc/X11/xorg中的“交互式”选项。conf文件,但X服务器无法识别此选项。为了在X服务器和GPGPU应用程序之间共享GPU资源,我可以做什么?我坚持这样做是因为我不适合使用文本模式环境编写和/或调试代码。

    1 回复  |  直到 9 年前
        1
  •  1
  •   user3116936    9 年前

    我以前的/etc/X11/xorg。conf文件如下:

    # nvidia-xconfig: X configuration file generated by nvidia-xconfig
    # nvidia-xconfig:  version 319.21  (buildmeister@swio-display-x86-rhel47-14)  Sun May 12 00:46:48 PDT 2013
    
    
    Section "ServerLayout"
        Identifier     "Layout0"
        Screen      0  "Screen0" 0 0
        InputDevice    "Keyboard0" "CoreKeyboard"
        InputDevice    "Mouse0" "CorePointer"
    EndSection
    
    Section "Files"
    EndSection
    
    Section "InputDevice"
    
        # generated from default
        Identifier     "Mouse0"
        Driver         "mouse"
        Option         "Protocol" "auto"
        Option         "Device" "/dev/psaux"
        Option         "Emulate3Buttons" "no"
        Option         "ZAxisMapping" "4 5"
    EndSection
    
    Section "InputDevice"
    
        # generated from default
        Identifier     "Keyboard0"
        Driver         "kbd"
    EndSection
    
    Section "Monitor"
        Identifier     "Monitor0"
        VendorName     "Unknown"
        ModelName      "Unknown"
        HorizSync       28.0 - 33.0
        VertRefresh     43.0 - 72.0
        Option         "DPMS"
    EndSection
    
    Section "Device"
        Identifier     "Device0"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
    EndSection
    
    Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        Monitor        "Monitor0"
        DefaultDepth    24
        SubSection     "Display"
            Depth       24
        EndSubSection
    EndSection
    

    为了解决这个问题,我们必须禁用看门狗超时,如下所示:

    # nvidia-xconfig: X configuration file generated by nvidia-xconfig
    # nvidia-xconfig:  version 319.21  (buildmeister@swio-display-x86-rhel47-14)  Sun May 12 00:46:48 PDT 2013
    
    
    Section "ServerLayout"
        Identifier     "Layout0"
        Screen      0  "Screen0" 0 0
        InputDevice    "Keyboard0" "CoreKeyboard"
        InputDevice    "Mouse0" "CorePointer"
    EndSection
    
    Section "Files"
    EndSection
    
    Section "InputDevice"
    
        # generated from default
        Identifier     "Mouse0"
        Driver         "mouse"
        Option         "Protocol" "auto"
        Option         "Device" "/dev/psaux"
        Option         "Emulate3Buttons" "no"
        Option         "ZAxisMapping" "4 5"
    EndSection
    
    Section "InputDevice"
    
        # generated from default
        Identifier     "Keyboard0"
        Driver         "kbd"
    EndSection
    
    Section "Monitor"
        Identifier     "Monitor0"
        VendorName     "Unknown"
        ModelName      "Unknown"
        HorizSync       28.0 - 33.0
        VertRefresh     43.0 - 72.0
        Option         "DPMS"
    EndSection
    
    Section "Device"
        Identifier     "Device0"
        Driver         "nvidia"
        VendorName     "NVIDIA Corporation"
    ##
    ##  disable watchdog timeouts for long-running CUDA kernels
    ##
        Option "Interactive" "false"
    EndSection
    
    Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
        Monitor        "Monitor0"
        DefaultDepth    24
        SubSection     "Display"
            Depth       24
        EndSubSection
    EndSection