Skip to content

Conversation

@Leguark
Copy link
Member

@Leguark Leguark commented Nov 21, 2025

[ENH] Improve tensor handling and device management in backend_tensor.py

  • Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance.
  • Introduced _zeros, _ones, and _eye wrapper functions for better consistency in tensor initialization on the specified device.
  • Refined the _wrap_pytorch_functions method to streamline tensor operations and ensure compatibility with the device settings.
  • Enabled stricter CUDA checks by updating conditions for GPU availability.

[ENH] Add keops_enabled parameter to improve kernel constructor modularity and enhance batch processing support

  • Introduced the keops_enabled parameter across various modules to enable conditional usage of PyKeOps for optimized computations.
  • Added _interpolate_stack_batched.py for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput.
  • Updated tensor creation logic in backend_tensor.py to include pykeops_eval_enabled for enhanced flexibility in method selection.
  • Refactored multiple constructor methods to propagate keops_enabled, ensuring consistent conditional logic for tensor handling and backend compatibility.
  • Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.

[WIP] Towards batching

[ENH] JIT-compiled kernel functions for improved GPU performance

  • Optimized kernel functions with torch.jit.script for better GPU execution
  • Refactored mathematical expressions for numerical stability and performance
  • Improved memory efficiency with fused operations

- Updated tensor creation to use pinned memory and non-blocking transfer for improved GPU performance.
- Introduced `_zeros`, `_ones`, and `_eye` wrapper functions for better consistency in tensor initialization on the specified device.
- Refined the `_wrap_pytorch_functions` method to streamline tensor operations and ensure compatibility with the device settings.
- Enabled stricter CUDA checks by updating conditions for GPU availability.
…ularity and enhance batch processing support

- Introduced the `keops_enabled` parameter across various modules to enable conditional usage of PyKeOps for optimized computations.
- Added `_interpolate_stack_batched.py` for GPU-accelerated batched interpolation with CUDA streams, minimizing memory overhead and improving throughput.
- Updated tensor creation logic in `backend_tensor.py` to include `pykeops_eval_enabled` for enhanced flexibility in method selection.
- Refactored multiple constructor methods to propagate `keops_enabled`, ensuring consistent conditional logic for tensor handling and backend compatibility.
- Improved fault data initialization and dependency handling in interpolation pipelines for better parallel computation.
Copy link
Member Author

Leguark commented Nov 21, 2025

@Leguark Leguark changed the title [ENH] Improve tensor handling and device management in backend_tensor.py [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing Nov 21, 2025
@Leguark Leguark changed the title [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing [ENH] Implement batched GPU interpolation with CUDA streams for parallel stack processing | GEN-14003 Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants