Add slab alignment validation in device memory allocator #586
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch series adds device memory slab address issue in the OCKL device
memory allocator that causes GPU page faults and crashes during memory
deallocation.
Problem
The deallocation code uses address masking to locate slab metadata:
saddr = addr & ~0x1fffffUL
This assumes all slabs are 2MB (2^21 bytes) aligned. However, the allocator
never validated this assumption when obtaining slabs from either:
When a misaligned slab is used (e.g., 0x6fd301ade000 instead of 0x6fd301a00000),
deallocation computes the wrong slab address, reads garbage metadata, and
causes array out-of-bounds access leading to GPU page faults.
Solution
Patch 1/2: Add alignment validation in __ockl_dm_init_v1
Patch 2/2: Add alignment validation in obtain_new_slab
Impact
These fixes prevent memory corruption and GPU crashes. Allocations will either
succeed with properly aligned slabs or fail safely with NULL returns, allowing
proper error handling instead of catastrophic failures.