Skip to content

Conversation

@atalhens
Copy link
Contributor

@atalhens atalhens commented Nov 26, 2025

This PR removes the duplicate code for gptq_marling_registration and awq_marlin_registration. Both operations had duplicate fake implementations registered in vllm/_custom_ops.py that were causing registration conflicts:

Closes: #29517

Purpose

Remove duplicate fake registration implementation for gptq_marlin_repack operation. The gptq_marlin_repack operation had a duplicate fake implementation registered in vllm/_custom_ops.py that was causing registration conflicts. This fake registration was redundant and incorrectly placed, as the operation is already properly defined and implemented in the C++ backend (csrc/quantization/gptq_marlin/gptq_marlin_repack.cu).
This PR removes the duplicate Python-side fake registration code that was causing the issue.

Test Plan

python -c "from vllm import _custom_ops as ops; print('Import successful!')"

Test Result

N/A


Essential Elements of an Effective PR Description Checklist
  • [] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly removes duplicate fake registration implementations for the gptq_marlin_repack and awq_marlin_repack operations from vllm/_custom_ops.py. As described, these registrations were causing import-time conflicts. The change is a clean and simple deletion of the redundant code, which resolves the issue. I approve this change.

@mergify
Copy link

mergify bot commented Dec 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @atalhens.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Duplicate registration of a fake implementation for the gptq_marlin_repack operator causing vllm serve to fail.

1 participant