Skip to content

Conversation

@andras-makany
Copy link

What it does

First mentioned in issue #2328. When using a merged dataset (version 3.0) torch.util.data.DataLoader encounters an exception that the given frame index is invalid.

The error was due to not reseting the latest_duration offset in aggregate_videos when creating a new file, resulting in an offset equal to the last files frame number when writing the second episode of the new file.

Screenshot from 2025-11-13 17-29-33

Solution was to reset latest_duration after creating a new file.

Screenshot from 2025-11-13 17-49-31

How it was tested

Using the fixed aggregate, I merged a new copy of my dataset containing 75 episodes with video file size set to 500 MB, creating 13 video file in the merged dataset.

Viewing the newly merged dataset, the offsets were corrected.

Screenshot from 2025-11-13 17-30-47

Then I executed a model training with batch size of 10 and 100 training steps. After multiple try, no exception was caught, assuming the problem was solved.

Repository included tests were successful.

How to checkout & try? (for the reviewer)

Try merging smaller datasets to have at least 2 video files in the new dataset. Viewing the meta/episodes/.../file-000.parquet the from_timestamp and to_timestamp values are sequential with no skips and resets to 0.0 when a new file starts.

Training a model gives no errors as well.

@Grigorij-Dudnik
Copy link

Grigorij-Dudnik commented Nov 15, 2025

Hey @andras-makany! I tested your PR; hovewer, it not works unfortunatelly. I still receiving the same 'RuntimeError: Invalid frame index=52130 for streamIndex=0; must be less than 19675' when trying to train a model on the output dataset.

What I did is cloned your branch, installed lerobot from it and runned merging with command:
lerobot-edit-dataset
--repo_id Grigorij/xle_left_arm_merged_filtered_2
--operation.type merge
--operation.repo_ids "['Grigorij/XLeRobot_arms', 'Grigorij/XLeRobot_arms_2', 'Grigorij/XLeRobot_arms_3', 'Grigorij/XLeRobot_arms_4', 'Grigorij/XLeRobot_arms_5', 'Grigorij/XLeRobot_arms_6', 'Grigorij/XLeRobot_arms_7', 'Grigorij/XLeRobot_arms_8', 'Grigorij/XLeRobot_arms_9', 'Grigorij/XLeRobot_arms_10', 'Grigorij/XLeRobot_arms_11']" --push_to_hub true

The output dataset is here: https://huggingface.co/datasets/Grigorij/xle_left_arm_merged_filtered_repaired/tree/main

@andras-makany
Copy link
Author

andras-makany commented Nov 15, 2025

Hey @Grigorij-Dudnik! Thank you for testing my attempt on this fix.

Your attached dataset revealed a major problem. My dataset only had a singular task, but yours have at least two. It seems when the task description changes, the merging starts a new file. In this case, the offset hasn't been reset, resulting in your error.

I will look into a possible solution Monday.

…ing occurs or having multiple episodes in one video file
@andras-makany
Copy link
Author

andras-makany commented Nov 17, 2025

So I found the problem that resulted in your error.

The chunk and file indexing forced a single value on every episode in a dataset. This caused your problem, since some of your datasets, episodes were concatenated to the previous file, but in the metadata got the new file's index.

Second error was due to having multiple episodes in one file. Your datasets had one episode per video which meant that the episode count was equal to the loop count. Mine had multiple episodes per video (as the resulting dataset too). This caused missing indexes in my dataset when aggregation logic was corrected to yours.

As a solution for chunk and file index instead of a single value, I'm using a list to keep track of the resulting indexes.

The resulting dataset is correct both for your case, and mine.

Tests were fine, test_aggregate_datasets failed on assert_dataset_content_integrity and assert_video_frames_integrity, but passed assert_video_timestamps_within_bounds, which tests the "Invalid frame index" errors. I believe the integrity assertions failed before this fix too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants