我在运行TrueNAS Scale的Debian 12虚拟机上遇到CUDA问题。我已经将GTX 1660超级GPU连接到虚拟机。
以下是我迄今为止所做工作的总结:
-
已安装最新的NVIDIA驱动程序
:
sudo apt install nvidia-driver firmware-misc-nonfree
-
使用PyTorch和CUDA 12.1设置Conda环境
:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
-
已测试安装
:
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
>>> device
device(type='cuda')
>>> torch.rand(10, device=device)
然而,当我试着跑步时
torch.rand(10, device=device)
,我得到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
有没有人遇到过类似的问题,或者对如何解决这个问题有任何建议?
环境详细信息:
-
OS
:Debian 12
-
GPU
:NVIDIA GTX 1660 Super
-
NVIDIA驱动程序版本
:535.161.08使用安装
sudo apt install nvidia-driver firmware-misc-nonfree
附加信息:
如有任何帮助或建议,我们将不胜感激!