[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42

Leguark · 2025-11-21T10:38:07Z

[ENH] Improve tensor handling and device management in backend_tensor.py

Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance.
Introduced _zeros, _ones, and _eye wrapper functions for better consistency in tensor initialization on the specified device.
Refined the _wrap_pytorch_functions method to streamline tensor operations and ensure compatibility with the device settings.
Enabled stricter CUDA checks by updating conditions for GPU availability.

[ENH] Add `keops_enabled` parameter to improve kernel constructor modularity and enhance batch processing support

Introduced the keops_enabled parameter across various modules to enable conditional usage of PyKeOps for optimized computations.
Added _interpolate_stack_batched.py for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput.
Updated tensor creation logic in backend_tensor.py to include pykeops_eval_enabled for enhanced flexibility in method selection.
Refactored multiple constructor methods to propagate keops_enabled, ensuring consistent conditional logic for tensor handling and backend compatibility.
Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.

[WIP] Towards batching

[ENH] JIT-compiled kernel functions for improved GPU performance

Optimized kernel functions with torch.jit.script for better GPU execution
Refactored mathematical expressions for numerical stability and performance
Improved memory efficiency with fused operations

- Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance. - Introduced `_zeros`, `_ones`, and `_eye` wrapper functions for better consistency in tensor initialization on the specified device. - Refined the `_wrap_pytorch_functions` method to streamline tensor operations and ensure compatibility with the device settings. - Enabled stricter CUDA checks by updating conditions for GPU availability.

…ularity and enhance batch processing support - Introduced the `keops_enabled` parameter across various modules to enable conditional usage of PyKeOps for optimized computations. - Added `_interpolate_stack_batched.py` for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput. - Updated tensor creation logic in `backend_tensor.py` to include `pykeops_eval_enabled` for enhanced flexibility in method selection. - Refactored multiple constructor methods to propagate `keops_enabled`, ensuring consistent conditional logic for tensor handling and backend compatibility. - Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.

Leguark · 2025-11-21T10:38:24Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Leguark added 4 commits November 20, 2025 15:34

[WIP] Towards batching

d7e8b82

[ENH] A few extra optimizations

bbf0394

Leguark changed the title ~~[ENH] Improve tensor handling and device management in backend_tensor.py~~ [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing Nov 21, 2025

Leguark changed the title ~~[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing~~ [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42

Uh oh!

Leguark commented Nov 21, 2025 •

edited

Loading

Uh oh!

Leguark commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42

Are you sure you want to change the base?

[ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 #42

Uh oh!

Conversation

Leguark commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[ENH] Improve tensor handling and device management in backend_tensor.py

[ENH] Add keops_enabled parameter to improve kernel constructor modularity and enhance batch processing support

[WIP] Towards batching

[ENH] JIT-compiled kernel functions for improved GPU performance

Uh oh!

Leguark commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Leguark commented Nov 21, 2025 •

edited

Loading

[ENH] Add `keops_enabled` parameter to improve kernel constructor modularity and enhance batch processing support