Remove duplicate fake registration implementations for gptq_marlin_repack and awq_marlin_repack operations. #29524
+0
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes the duplicate code for
gptq_marling_registrationandawq_marlin_registration. Both operations had duplicate fake implementations registered in vllm/_custom_ops.py that were causing registration conflicts:Closes: #29517
Purpose
Remove duplicate fake registration implementation for gptq_marlin_repack operation. The
gptq_marlin_repackoperation had a duplicate fake implementation registered in vllm/_custom_ops.py that was causing registration conflicts. This fake registration was redundant and incorrectly placed, as the operation is already properly defined and implemented in the C++ backend (csrc/quantization/gptq_marlin/gptq_marlin_repack.cu).This PR removes the duplicate Python-side fake registration code that was causing the issue.
Test Plan
python -c "from vllm import _custom_ops as ops; print('Import successful!')"Test Result
N/A
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.