(Potential) memory leak with uproot.iterate

(this is tested in uproot v5.6.5, likely present in other versions as well).


### The issue & reproducer

When running the following snippet and benchmarking this with memray:
```python
import uproot

def loop_iterate(rootfile):
    with uproot.open(rootfile, array_cache=None, object_cache=None) as f:
        tree = f["tree"]
        for batch in tree.iterate(step_size="200 MB", library="ak"):
            print(repr(batch))


if __name__ == "__main__":
    loop_iterate("~/Downloads/zlib9-jagged0.root")
```
I'm getting a memory consumption of the physical RAM usage (RSS) of up to 1.6GB (even though the step size is 200 MB): 

<img width="2238" height="1022" alt="Image" src="https://github.com/user-attachments/assets/21e36a12-2b34-4091-b9e6-6942741f8968" />

This is already surprising, however another indication that there's something up is that an explicit `gc.collect` at the end of each iteration improves the RSS situation by ~2x, i.e.:
```diff
import uproot
+import gc

def loop_iterate(rootfile):
    with uproot.open(rootfile, array_cache=None, object_cache=None) as f:
        tree = f["tree"]
        for batch in tree.iterate(step_size="200 MB", library="ak"):
            print(repr(batch))
+           del batch
+           gc.collect()


if __name__ == "__main__":
    loop_iterate("~/Downloads/zlib9-jagged0.root")
```
which gives up to 800MB RSS consumption:

<img width="2226" height="1002" alt="Image" src="https://github.com/user-attachments/assets/fe7eb61e-8dfb-4a6e-9231-81a3d084781f" />


### Why is this bad?

RSS is the physical RAM usage by this process, which dask monitors to decide if a worker should be killed due to OOM or not. 

### What I've found so far...

The memory usage grows by the following function: https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L1440-L1452 and to be more explicit by this part of it: https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L3421-L3428

What does work correctly is that the file `arrays` dictionary by the above function is ~200 MB, that's good! However, this `_ranges_or_baskets_to_arrays` still uses ~800 MB to fill the ~200 MB `arrays` dict and _does not free it again_.

Also, the ["popper-trick"](https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L1500-L1506) that @jpivarski  introduced in https://github.com/scikit-hep/uproot5/pull/1305 does enable the manual `gc.collect` to help here (without it even that won't help). 

So, my understanding right now is that `uproot.iterate` does yield correctly sized arrays, but it uses way to much memory while doing so and also doesn't free it properly.

### Other implications

`_ranges_or_baskets_to_arrays` is used also in other loading functions, and some quick tests showed that:

```python
def loop_manual(rootfile):
    with uproot.open(rootfile, array_cache=None, object_cache=None) as f:
        tree = f["tree"]

        starts = list(range(0, tree.num_entries, tree.num_entries // 20))
        stops = starts[1:] + [tree.num_entries]
        ranges = list(zip(starts, stops))
        for start, stop in ranges:
            print(f"entry {start} to {stop}")
            entry = tree.arrays(entry_start=start, entry_stop=stop, library="ak")
            print(repr(entry))


def loop_same_chunks(rootfile):
    with uproot.open(rootfile, array_cache=None, object_cache=None) as f:
        tree = f["tree"]

        chunk_starts = 0
        chunk_stops = 53687091
        for _ in range(10):
            print(f"entry {chunk_starts} to {chunk_stops}")
            entry = tree.arrays(entry_start=chunk_starts, entry_stop=chunk_stops, library="ak")
            print(repr(entry))
```

have a similar memory behavior, see e.g. the profile for `loop_manual` (the numerical values of the y axis is different of course because I can't exactly mirror "200 MB" steps by hand):

<img width="1121" height="512" alt="Image" src="https://github.com/user-attachments/assets/e256bcb5-5b53-4dae-b311-57ac01567cef" />

and for `loop_same_chunks`:

<img width="1116" height="509" alt="Image" src="https://github.com/user-attachments/assets/d7aaa73c-af26-4d4c-bc81-aa4e99a999ac" />

### What I want to see / was expecting

The orange and blue line overlap and roughly follow a saw tooth shape with 200 MB jumps per iteration (and not much additional overhead in RAM).

---
This has been originally been found by @oshadura in the scope of the integration challenge, here I just attach a local reproducer with some first findings.

cc @oshadura @alexander-held 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Potential) memory leak with uproot.iterate #1528

The issue & reproducer

Why is this bad?

What I've found so far...

Other implications

What I want to see / was expecting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Potential) memory leak with uproot.iterate #1528

Description

The issue & reproducer

Why is this bad?

What I've found so far...

Other implications

What I want to see / was expecting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions