-
Notifications
You must be signed in to change notification settings - Fork 2k
Feat: Implement CUDA programming with Unified Memory for dataset loader (train large data on limited VRAM) #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
darrenchang
wants to merge
7
commits into
NVlabs:master
Choose a base branch
from
empirica-lab:cuda_malloc_manager
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+44
−18
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
7254a21
Feat: implement progress bar for image to GPU convert
darrenchang 6de3792
Feat: implement --low_vram mode
darrenchang 61d2e49
Change low_vram option to nerf_dataset_in_cpu_ram
darrenchang 5529115
Fix: Use tinycuda GPUMemory
darrenchang d89a671
Refactor
darrenchang b12bb3a
Fix: Move dataset_in_cpu_ram to Testbed::Nerf::Training
darrenchang 8bbdc3e
Remove debug loggers
darrenchang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -270,13 +270,11 @@ bool read_focal_length(const nlohmann::json &json, vec2 &focal_length, const ive | |
| return true; | ||
| } | ||
|
|
||
| NerfDataset load_nerf(const std::vector<fs::path>& jsonpaths, float sharpen_amount) { | ||
| NerfDataset load_nerf(const std::vector<fs::path>& jsonpaths, float sharpen_amount, bool in_cpu_ram) { | ||
| if (jsonpaths.empty()) { | ||
| throw std::runtime_error{"Cannot load NeRF data from an empty set of paths."}; | ||
| } | ||
|
|
||
| tlog::info() << "Loading NeRF dataset from"; | ||
|
|
||
| NerfDataset result{}; | ||
|
|
||
| std::ifstream f{native_string(jsonpaths.front())}; | ||
|
|
@@ -727,26 +725,28 @@ NerfDataset load_nerf(const std::vector<fs::path>& jsonpaths, float sharpen_amou | |
| result.sharpness_data.enlarge( result.sharpness_resolution.x * result.sharpness_resolution.y * result.n_images ); | ||
|
|
||
| // copy / convert images to the GPU | ||
| auto progress_to_gpu = tlog::progress(result.n_images); | ||
| tlog::info() << "Copying / converting images to GPU..."; | ||
| for (uint32_t i = 0; i < result.n_images; ++i) { | ||
| const LoadedImageInfo& m = images[i]; | ||
| result.set_training_image(i, m.res, m.pixels, m.depth_pixels, m.depth_scale * result.scale, m.image_data_on_gpu, m.image_type, EDepthDataType::UShort, sharpen_amount, m.white_transparent, m.black_transparent, m.mask_color, m.rays); | ||
| result.set_training_image(i, m.res, m.pixels, m.depth_pixels, m.depth_scale * result.scale, m.image_data_on_gpu, m.image_type, EDepthDataType::UShort, sharpen_amount, m.white_transparent, m.black_transparent, m.mask_color, m.rays, in_cpu_ram); | ||
| CUDA_CHECK_THROW(cudaDeviceSynchronize()); | ||
| } | ||
| CUDA_CHECK_THROW(cudaDeviceSynchronize()); | ||
| // free memory | ||
| for (uint32_t i = 0; i < result.n_images; ++i) { | ||
| // free memory | ||
| if (images[i].image_data_on_gpu) { | ||
| CUDA_CHECK_THROW(cudaFree(images[i].pixels)); | ||
| } else { | ||
| free(images[i].pixels); | ||
| } | ||
| free(images[i].rays); | ||
| free(images[i].depth_pixels); | ||
| progress_to_gpu.update(i); | ||
| } | ||
| CUDA_CHECK_THROW(cudaDeviceSynchronize()); | ||
| tlog::success() << "Copied / converted " << images.size() << " images to GPU after " << tlog::durationToString(progress_to_gpu.duration()); | ||
| return result; | ||
| } | ||
|
|
||
| void NerfDataset::set_training_image(int frame_idx, const ivec2& image_resolution, const void* pixels, const void* depth_pixels, float depth_scale, bool image_data_on_gpu, EImageDataType image_type, EDepthDataType depth_type, float sharpen_amount, bool white_transparent, bool black_transparent, uint32_t mask_color, const Ray *rays) { | ||
| void NerfDataset::set_training_image(int frame_idx, const ivec2& image_resolution, const void* pixels, const void* depth_pixels, float depth_scale, bool image_data_on_gpu, EImageDataType image_type, EDepthDataType depth_type, float sharpen_amount, bool white_transparent, bool black_transparent, uint32_t mask_color, const Ray *rays, bool in_cpu_ram) { | ||
| if (frame_idx < 0 || frame_idx >= n_images) { | ||
| throw std::runtime_error{"NerfDataset::set_training_image: invalid frame index"}; | ||
| } | ||
|
|
@@ -772,8 +772,13 @@ void NerfDataset::set_training_image(int frame_idx, const ivec2& image_resolutio | |
| } | ||
|
|
||
| // copy or convert the pixels | ||
| pixelmemory[frame_idx].resize(img_size * image_type_size(image_type)); | ||
| void* dst = pixelmemory[frame_idx].data(); | ||
| size_t total_image_mem_size = img_size * image_type_size(image_type); | ||
| void* dst; | ||
| pixelmemory[frame_idx] = GPUMemory<uint8_t>(total_image_mem_size, in_cpu_ram); | ||
| dst = pixelmemory[frame_idx].data(); | ||
| if (in_cpu_ram) { | ||
| CUDA_CHECK_THROW(cudaMemAdvise(dst, pixelmemory[frame_idx].get_bytes(), cudaMemAdviseSetPreferredLocation, cudaCpuDeviceId)); | ||
| } | ||
|
|
||
| switch (image_type) { | ||
| default: throw std::runtime_error{"unknown image type in set_training_image"}; | ||
|
|
@@ -846,10 +851,10 @@ void NerfDataset::set_training_image(int frame_idx, const ivec2& image_resolutio | |
| raymemory[frame_idx].free_memory(); | ||
| } | ||
| metadata[frame_idx].rays = raymemory[frame_idx].data(); | ||
| update_metadata(frame_idx, frame_idx + 1); | ||
| update_metadata(frame_idx, frame_idx + 1, in_cpu_ram); | ||
| } | ||
|
|
||
| void NerfDataset::update_metadata(int first, int last) { | ||
| void NerfDataset::update_metadata(int first, int last, bool in_cpu_ram) { | ||
| if (last < 0) { | ||
| last = n_images; | ||
| } | ||
|
|
@@ -864,7 +869,10 @@ void NerfDataset::update_metadata(int first, int last) { | |
| } | ||
|
|
||
| metadata_gpu.enlarge(last); | ||
| CUDA_CHECK_THROW(cudaMemcpy(metadata_gpu.data() + first, metadata.data() + first, n * sizeof(TrainingImageMetadata), cudaMemcpyHostToDevice)); | ||
| if (!in_cpu_ram) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems wrong to me. The metadata is still stored on the GPU -- and it's so small that it wouldn't make use to offload to CPU ram anyway. |
||
| size_t total_size = n * sizeof(TrainingImageMetadata); | ||
| CUDA_CHECK_THROW(cudaMemcpy(metadata_gpu.data() + first, metadata.data() + first, total_size, cudaMemcpyHostToDevice)); | ||
| } | ||
| } | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall exactly why, but I believe there was a reason for the two loops to be separate. I think the underlying memory might be aliased in some cases -- pleave revert. Putting the progress bar in the first loop likely matches current behavior closely enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the heads up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't actually put the progress into the first loop. Putting it in the second is somewhat meaningless -- calling
freeis pretty much free.