add CUDA 12.9 unit test (#3592)

TroyGarden · meta-codesync[bot] · commit 0a2cebd5472a · 2025-12-04T12:03:21.000-08:00
Summary: Pull Request resolved: #3592 # context * previously CUDA 12.9 wasn't added to the unit test due to fbgemm capatibility * now fbgemm has the support we added it into our test suite. * for PR we only run CUDA 12.9 with python 3.13 # issue fix * previous torchrec guithub workflow for gpu unittests are failing due to missing A100 support in fbgemm ``` >>> a=torch.empty(3, device='cuda').int() >>> torch.ops.fbgemm.permute_2D_sparse_data(a,b,a) Traceback (most recent call last): File "<python-input-22>", line 1, in <module> torch.ops.fbgemm.permute_2D_sparse_data(a,b,a) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^ File "/home/hhy/.conda/envs/ci/lib/python3.13/site-packages/torch/_ops.py", line 1237, in __call__ return self._op(*args, **kwargs) ~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/hhy/.conda/envs/ci/lib/python3.13/site-packages/torch/_library/autograd.py", line 112, in autograd_impl result = forward_no_grad(*args, Metadata(keyset, keyword_only_args)) File "/home/hhy/.conda/envs/ci/lib/python3.13/site-packages/torch/_library/autograd.py", line 41, in forward_no_grad result = op.redispatch(keyset & _C._after_autograd_keyset, *args, **kwargs) File "/home/hhy/.conda/envs/ci/lib/python3.13/site-packages/torch/_ops.py", line 822, in redispatch return self._handle.redispatch_boxed(keyset, *args, **kwargs) # type: ignore[return-value] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: [/__w/FBGEMM/FBGEMM/pytorch/FBGEMM/fbgemm_gpu/src/sparse_ops/sparse_permute_2d.cu(113:98)] [(permute_2D_lengths_kernel<index_t>)] CUDA Error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ``` * env setup ``` $ conda create -yn fbgemm python=3.13 $ conda activate fbgemm $ pip install torch --index-url https://download.pytorch.org/whl/nightly/cu128 $ pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/nightly/cu128 $ python -i ``` * python cli ``` >>> import torch >>> torch.ops.import_module("fbgemm_gpu.sparse_ops") >>> a=torch.empty(3, device='cuda').int() >>> b=torch.empty((3,3), device='cuda').long() >>> torch.ops.fbgemm.permute_2D_sparse_data(a,b,a) ``` * resolved > the issue is you're running on A100 and fbgemm removed sm80 earlier in nightly, recently just added back. So if you uninstall fbgemm gpu and re-install it for today's release. It should fix your issue. Reviewed By: aporialiao Differential Revision: D88400551 fbshipit-source-id: 6da61d3841edcb3dcb51bbb8156822337be1ca78
diff --git a/.github/workflows/unittest_ci.yml b/.github/workflows/unittest_ci.yml
@@ -36,7 +36,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        cuda-tag: ["cu126", "cu128"]
+        cuda-tag: ["cu126", "cu128", "cu129"]
         os:
           - linux.g5.12xlarge.nvidia.gpu
         python:
@@ -55,18 +55,20 @@ jobs:
             cuda-tag: "cu126"
           - is_pr: true
             cuda-tag: "cu128"
+          - is_pr: true
+            cuda-tag: "cu129"
             python:
               version: "3.9"
           - is_pr: true
-            cuda-tag: "cu128"
+            cuda-tag: "cu129"
             python:
               version: "3.10"
           - is_pr: true
-            cuda-tag: "cu128"
+            cuda-tag: "cu129"
             python:
               version: "3.11"
           - is_pr: true
-            cuda-tag: "cu128"
+            cuda-tag: "cu129"
             python:
               version: "3.12"
     uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main