如果出现如下问题:

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

可以尝试修改性能设置:默认是尝试使用xFormers改成自动,重启在尝试。