-
Notifications
You must be signed in to change notification settings - Fork 598
Demux refactoring without backend conversion #2339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new GetPreferredBatchStep() method to the Network interface to better control batch size granularity across different backend implementations, addressing issues with batch size management in the demux backend.
- Adds
GetPreferredBatchStep()virtual method to the baseNetworkclass with a default return value of 1 - Implements backend-specific preferred batch step logic for XLA, ONNX, CUDA, and demux backends
- Refactors the demux backend to use proper batch step-based work distribution instead of the previous simple splitting approach
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/neural/network.h | Adds new virtual method GetPreferredBatchStep() to the Network interface |
| src/neural/backends/xla/xla_runner.h | Declares GetPreferredBatchStep() method for XLA runner |
| src/neural/backends/xla/xla_runner.cc | Implements preferred batch step as the smallest executable batch size |
| src/neural/backends/xla/network_xla.cc | Removes workaround for batch size and delegates to new preferred batch step method |
| src/neural/backends/network_onnx.cc | Implements preferred batch step based on batch configuration |
| src/neural/backends/network_demux.cc | Major refactor with new work distribution algorithm using batch steps and improved synchronization |
| src/neural/backends/cuda/network_cuda.cc | Implements CUDA-specific preferred batch step based on streaming multiprocessor count |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include <mutex> | ||
| #include <queue> | ||
| #include <thread> | ||
|
|
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing include for SpinloopPause(). The function SpinloopPause() is used in line 76 but the required header "utils/mutex.h" is not included. This will cause a compilation error.
| #include "utils/mutex.h" |
| for (; i != end_index; i = (i + 1) % network_->backends_.size()) { | ||
| assert(work_start != GetBatchSize()); | ||
| int work_end = work_start + split_size * network_->batch_step_; | ||
| work_end = std::min(work_end, GetBatchSize()); | ||
| parents_.emplace_back(this, work_start, work_end); | ||
| network_->backends_[i].Enqueue(&parents_.back()); | ||
| work_start = work_end; |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug when all backends need extra work. If extra_split_backends == network_->backends_.size(), then end_index will equal start_index (line 296), causing the loop at line 306 to not execute even though all backends should receive extra work. Consider handling this case explicitly, or using a counter-based loop instead of checking i != end_index.
| for (; i != end_index; i = (i + 1) % network_->backends_.size()) { | |
| assert(work_start != GetBatchSize()); | |
| int work_end = work_start + split_size * network_->batch_step_; | |
| work_end = std::min(work_end, GetBatchSize()); | |
| parents_.emplace_back(this, work_start, work_end); | |
| network_->backends_[i].Enqueue(&parents_.back()); | |
| work_start = work_end; | |
| for (int count = 0; count < extra_split_backends; ++count) { | |
| assert(work_start != GetBatchSize()); | |
| int work_end = work_start + split_size * network_->batch_step_; | |
| work_end = std::min(work_end, GetBatchSize()); | |
| parents_.emplace_back(this, work_start, work_end); | |
| network_->backends_[i].Enqueue(&parents_.back()); | |
| work_start = work_end; | |
| i = (i + 1) % network_->backends_.size(); |
Co-authored-by: Copilot <[email protected]>
No description provided.