Commit 502826c
committed
Compile PyTorch, torchvision, onnxruntime from source for numpy 2.x support
- PyTorch 2.8.0 with Jetson Orin optimizations (arch 8.7, ARM+CUDA linker optimization)
- Disabled unnecessary features (NCCL, QNNPACK, XNNPACK, FBGEMM, Kineto, etc.)
- torchvision 0.23.0 with CUDA support
- onnxruntime 1.20.0 with TensorRT EP
- flash-attn 2.8.3 (latest version)
Performance: 65.7 FPS (vs 62.2 FPS baseline = 5.6% faster)
Image size: 6.74GB (vs 8.28GB baseline = 18.6% smaller)
Size optimizations:
- cuDNN/TensorRT symlink preservation: ~2GB saved
- Remove test directories, dev tools, examples: ~500MB saved
- Conservative cleanup preserving public APIs (numpy.testing, torch.testing)
TensorRT optimization:
- FP16 precision enabled
- Engine caching enabled with 2GB workspace
- Builder optimization level 3
- Aux streams optimized for memory efficiency1 parent 12882e3 commit 502826c
File tree
2 files changed
+259
-112
lines changed- .github/workflows
- docker/dockerfiles
2 files changed
+259
-112
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
0 commit comments