Skip to content

Conversation

@khluu
Copy link
Collaborator

@khluu khluu commented Nov 24, 2025

Split up tests in test-pipeline.yaml into 25 different test areas, each test area is 1 yaml file. I try to have a similar structure with /tests/ directory.

  • Attention
  • Basic Correctness
  • Benchmarks
  • Compile
  • CUDA
  • Distributed
  • E2E Integration
  • Engine
  • Entrypoints
  • Expert Parallelism
  • Kernels
  • LM Eval
  • LoRA
  • Miscellaneous
  • Model Executor
  • Models - Basic
  • Models - Language
  • Models - Distributed
  • Models - Multimodal
  • Plugins
  • PyTorch
  • Quantization
  • Samplers
  • Tool use
  • Weight Loading

p
Signed-off-by: Kevin H. Luu <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the main CI pipeline configuration file into smaller, more manageable files based on test areas. This is a positive change for maintainability. My review focuses on the correctness of these new CI configuration files. I've identified a few critical issues related to incorrect GPU allocation and process counts in the distributed tests, as well as a potentially missed test case in the LoRA test suite. Addressing these will ensure the CI pipeline remains robust and correct after this refactoring.

- "pytest -v -s tests/compile/distributed/test_fusions_e2e.py -k 'not Llama-4'"
- pytest -v -s tests/distributed/test_sequence_parallel.py
- pytest -v -s tests/distributed/test_context_parallel.py
- CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Distributed Tests (2 GPUs)(H200) step is configured with num_gpus: 2. The CI runner will likely allocate GPUs with indices 0 and 1 for this job. However, the command hardcodes CUDA_VISIBLE_DEVICES=1,2. This can cause problems: if GPU 2 is not available, the test will fail. If it is available but not allocated to this job, it might interfere with other jobs. It's safer to rely on the CI runner's GPU allocation, which typically uses indices starting from 0. For a 2-GPU job, this would be 0,1.

    - CUDA_VISIBLE_DEVICES=0,1 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1  --dp-size=2 --max-model-len 2048

@khluu khluu changed the title [ci] Break down test-pipeline.yaml into test areas [DNM][ci] Break down test-pipeline.yaml into test areas Nov 24, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

num_gpus: 4
working_dir: "/vllm-workspace"
commands:
- bash .buildkite/scripts/scheduled_integration_test/qwen30b_a3b_fp8_block_ep.sh 0.8 200 8020
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @khluu - I added a B200 tests and changed the name of this test file to qwen30b_a3b_fp8_block_ep_eplb.sh - Can you reflect these changes please #29195 . Thanks 🙌

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@ProExpertProg
Copy link
Collaborator

Can we also make it that tests always run if the specific CI file is changed?

num_gpus: 4
working_dir: "/vllm-workspace"
commands:
- bash .buildkite/scripts/scheduled_integration_test/deepseek_v2_lite_ep_eplb.sh 0.25 200 8010
Copy link
Collaborator

@yeqcharlotte yeqcharlotte Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files appear to be ep related and e2e integration tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest o to leave it here or move to EP file?

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's your plan to eventually merge this?

i think you might wanna merge the split earlier. maybe use some import rules so they are more consistent. otherwise others will keep adding tests and you wouldn't be able to rebase on top.

you can also double check using @rzabarazesh's test coverage collection script to see whether there's difference on coverage before and after.

@@ -0,0 +1,150 @@
group: Miscellaneous
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a big fan of misc, i think this might end up capturing a bunch of random stuff.

what's your thought on

  • moving V1 stuff to different test areas.
  • cpu stuff can be on its own test area.
  • one yaml for packaging and installation.
  • examples, prime rl compatibility tests can be moved to e2e integrations.
  • gptoss eval and lm eval can be merged into e2e engine accuracy.

Copy link
Collaborator Author

@khluu khluu Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we shouldn't have a misc group, this is because I don't know where to place these jobs yet, going to figure that out after the restructuring...

  • Some jobs here (V1 others, async engine) need to be broken down and integrated with tests in other areas
  • Moved Prime RL to e2e_integration

can be merged into e2e engine accuracy.

We don't have this yet. Do you mean to create one?

My plan is to get the pipeline generator working with this refactor, then merge all together.

@khluu khluu requested a review from yeqcharlotte November 25, 2025 17:52
@mergify
Copy link

mergify bot commented Dec 1, 2025

Documentation preview: https://vllm--29343.org.readthedocs.build/en/29343/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 1, 2025
khluu added 4 commits December 1, 2025 15:24
Signed-off-by: Kevin H. Luu <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
khluu added 3 commits December 1, 2025 18:48
Signed-off-by: Kevin H. Luu <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
Signed-off-by: Kevin H. Luu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation nvidia

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants