-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Problem
The kvikio library is removing its python bindings to nvcomp - rapidsai/kvikio#798. The GPU decompression in RNTuple reading will need to be updated instead to use the official nvCOMP python library - https://github.com/NVIDIA/nvcomp.
Initially, I stayed with the kvikio library for nvCOMP decompression bindings because its API supports user-provided output buffers. The idea was that we could reduce memory allocations by doing so, but it turns out the kvikio implementation creates unnecessary buffers for the decompressed data which is later copied into the user-provided buffers.
The nvCOMP team is considering updating the python implementation to allow for user-provided output buffers. https://forums.developer.nvidia.com/t/please-support-user-provided-output-buffers-in-python-api/332763
Solution
Ultimately, transitioning from the kvikio bindings to the official nvCOMP python API should not significantly impact performance. The code however will need some adjustments. First, the official nvCOMP API does not take cupy.ndarray as inputs to its codecs. Uproot will have to handle this conversion to and from nvcomp.Array.
data_gpu = cp.array(ascending)
nvarr_d = nvcomp.as_array(data_gpu)
lz4_comp_arr = lz4_codec.encode(nvarr_d)
Second until user-provided buffers are supported, uproot will have to handle the copy into the full output buffer.
Motivation for using user-provided output buffers
Through RNTuple metadata we know the total decompressed size for all pages in a given cluster.
uproot5/src/uproot/models/RNTuple.py
Lines 764 to 765 in 9798afd
| total_len = numpy.sum([desc.num_elements for desc in pagelist], dtype=int) | |
| full_output_buffer = cupy.empty(total_len, dtype=field_metadata.dtype) |
Instead of decompressing each page into a separate buffer and then copying the result into a buffer representing the cluster data, we can decompress each page directly into a pointer to the larger buffer containing all the contents for that cluster.
uproot5/src/uproot/models/RNTuple.py
Line 793 in 9798afd
| out_buff = full_output_buffer[tracker:tracker_end] |