Move RNTuple GPU Decompression from KvikIO to nvCOMP

# Problem
The kvikio library is removing its python bindings to nvcomp - https://github.com/rapidsai/kvikio/pull/798. The [GPU decompression in RNTuple reading](https://github.com/scikit-hep/uproot5/blob/main/src/uproot/models/RNTuple.py#L1792) will need to be updated instead to use the official nvCOMP python library - https://github.com/NVIDIA/nvcomp.

Initially, I stayed with the kvikio library for nvCOMP decompression bindings because its API supports user-provided output buffers. The idea was that we could reduce memory allocations by doing so, but it turns out the kvikio implementation [creates unnecessary buffers for the decompressed data which is later copied into the user-provided buffers](https://github.com/rapidsai/kvikio/blob/8e11ed2edc0c392ec409d2309793c11af8e91ee2/python/kvikio/kvikio/nvcomp_codec.py#L216). 

The nvCOMP team is considering updating the python implementation to allow for user-provided output buffers. https://forums.developer.nvidia.com/t/please-support-user-provided-output-buffers-in-python-api/332763

# Solution
Ultimately, transitioning from the kvikio bindings to the official nvCOMP python API should not significantly impact performance. The code however will need some adjustments. First, the official nvCOMP API [does not take cupy.ndarray as inputs to its codecs](https://docs.nvidia.com/cuda/nvcomp/samples/nvcomp.html#Zero-copy-import-device-array). Uproot will have to handle this conversion to and from `nvcomp.Array`.
```
data_gpu = cp.array(ascending)
nvarr_d = nvcomp.as_array(data_gpu)
lz4_comp_arr = lz4_codec.encode(nvarr_d)
```
Second until user-provided buffers are supported, uproot will have to handle the copy into the full output buffer.


# Motivation for using user-provided output buffers
Through RNTuple metadata we know the total decompressed size for all pages in a given cluster.
https://github.com/scikit-hep/uproot5/blob/9798afd274ba4c4f4d94ce1853a7f062f1ddeb4e/src/uproot/models/RNTuple.py#L764-L765
 Instead of decompressing each page into a separate buffer and then copying the result into a buffer representing the cluster data, we can decompress each page directly into a pointer to the larger buffer containing all the contents for that cluster. https://github.com/scikit-hep/uproot5/blob/9798afd274ba4c4f4d94ce1853a7f062f1ddeb4e/src/uproot/models/RNTuple.py#L793


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move RNTuple GPU Decompression from KvikIO to nvCOMP #1497

Problem

Solution

Motivation for using user-provided output buffers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	total_len = numpy.sum([desc.num_elements for desc in pagelist], dtype=int)
	full_output_buffer = cupy.empty(total_len, dtype=field_metadata.dtype)

Move RNTuple GPU Decompression from KvikIO to nvCOMP #1497

Description

Problem

Solution

Motivation for using user-provided output buffers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions