代码之家 › 专栏 › 技术社区 › user3116936

当X Server打开时,大型阵列的内核超时

gpgpu cuda

user3116936 · 技术社区 · 10 年前

我正在启动内核并检查可能的错误,如下所示:

kernel<<<grid,block>>>(d_Basis, d_repul_aux,nao);
  cout<<"done with the ERIs...."<<endl;
  std::string error = cudaGetErrorString(cudaPeekAtLastError());
  cout<<error<<endl;

HANDLE_ERROR(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repulsion_aux,eris_size*sizeof(double),cudaMemcpyDeviceToHost));

其中cudaGetErrorString(cudaPeekAtLastError())用于对内核进行错误检查,我定义了:

static void HandleError( cudaError_t err,
                         const char *file,
                         int line ) {
  if (err != cudaSuccess) {
    printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
            file, line );
    exit( EXIT_FAILURE );
  }
}

#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))

当X服务器关闭时,计算按预期运行;但如果我打开X服务器,内核会挂起,我会得到以下输出:

done with the ERIs....
no error
the launch timed out and was terminated in main.cu at line 1038

源代码中的第1038行对应于:

句柄错误(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repusion_aux,eris_size*sizeof(double),cudaMemcpyDeviceToHost));

这意味着当我们将结果从设备复制到主机时,计算会崩溃。我使用的是图形卡GEforce GTx-480和CUDA 7.5。

为了解决这个问题,我尝试关闭/etc/X11/xorg中的“交互式”选项。conf文件,但X服务器无法识别此选项。为了在X服务器和GPGPU应用程序之间共享GPU资源,我可以做什么?我坚持这样做是因为我不适合使用文本模式环境编写和/或调试代码。

1 回复 | 直到 10 年前

user3116936 10 年前

我以前的/etc/X11/xorg。conf文件如下:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 319.21  (buildmeister@swio-display-x86-rhel47-14)  Sun May 12 00:46:48 PDT 2013


Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

为了解决这个问题,我们必须禁用看门狗超时,如下所示:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 319.21  (buildmeister@swio-display-x86-rhel47-14)  Sun May 12 00:46:48 PDT 2013


Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
##
##  disable watchdog timeouts for long-running CUDA kernels
##
    Option "Interactive" "false"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection