Demux refactoring without backend conversion #2339

Menkib64 · 2025-11-06T20:45:31Z

No description provided.

Copilot

Pull Request Overview

This PR introduces a new GetPreferredBatchStep() method to the Network interface to better control batch size granularity across different backend implementations, addressing issues with batch size management in the demux backend.

Adds GetPreferredBatchStep() virtual method to the base Network class with a default return value of 1
Implements backend-specific preferred batch step logic for XLA, ONNX, CUDA, and demux backends
Refactors the demux backend to use proper batch step-based work distribution instead of the previous simple splitting approach

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/neural/network.h	Adds new virtual method `GetPreferredBatchStep()` to the Network interface
src/neural/backends/xla/xla_runner.h	Declares `GetPreferredBatchStep()` method for XLA runner
src/neural/backends/xla/xla_runner.cc	Implements preferred batch step as the smallest executable batch size
src/neural/backends/xla/network_xla.cc	Removes workaround for batch size and delegates to new preferred batch step method
src/neural/backends/network_onnx.cc	Implements preferred batch step based on batch configuration
src/neural/backends/network_demux.cc	Major refactor with new work distribution algorithm using batch steps and improved synchronization
src/neural/backends/cuda/network_cuda.cc	Implements CUDA-specific preferred batch step based on streaming multiprocessor count

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-06T21:40:09Z

src/neural/backends/network_demux.cc

+#include <mutex>
 #include <queue>
 #include <thread>



Missing include for SpinloopPause(). The function SpinloopPause() is used in line 76 but the required header "utils/mutex.h" is not included. This will cause a compilation error.

Suggested change

#include "utils/mutex.h"

src/neural/backends/cuda/network_cuda.cc

src/neural/backends/xla/xla_runner.cc

src/neural/backends/network_demux.cc

Copilot · 2025-11-06T21:40:10Z

src/neural/backends/network_demux.cc

+  for (; i != end_index; i = (i + 1) % network_->backends_.size()) {
+    assert(work_start != GetBatchSize());
+    int work_end = work_start + split_size * network_->batch_step_;
+    work_end = std::min(work_end, GetBatchSize());
+    parents_.emplace_back(this, work_start, work_end);
+    network_->backends_[i].Enqueue(&parents_.back());
+    work_start = work_end;


Potential bug when all backends need extra work. If extra_split_backends == network_->backends_.size(), then end_index will equal start_index (line 296), causing the loop at line 306 to not execute even though all backends should receive extra work. Consider handling this case explicitly, or using a counter-based loop instead of checking i != end_index.

Suggested change

for (; i != end_index; i = (i + 1) % network_->backends_.size()) {

assert(work_start != GetBatchSize());

int work_end = work_start + split_size * network_->batch_step_;

work_end = std::min(work_end, GetBatchSize());

parents_.emplace_back(this, work_start, work_end);

network_->backends_[i].Enqueue(&parents_.back());

work_start = work_end;

for (int count = 0; count < extra_split_backends; ++count) {

assert(work_start != GetBatchSize());

int work_end = work_start + split_size * network_->batch_step_;

work_end = std::min(work_end, GetBatchSize());

parents_.emplace_back(this, work_start, work_end);

network_->backends_[i].Enqueue(&parents_.back());

work_start = work_end;

i = (i + 1) % network_->backends_.size();

Co-authored-by: Copilot <[email protected]>

Menkib64 added 5 commits November 6, 2025 22:43

Reduce lock contention in demux

9f72886

Add preferred batch step to xla backend

9bf29e6

Fix demux race between workers and computation destructor

441aadb

Add preferred batch step heuristic to cuda

bec0303

Fix slow performance when the last backend gets only a partial split

c655c82

borg323 requested a review from Copilot November 6, 2025 21:35

Copilot AI reviewed Nov 6, 2025

View reviewed changes

Fix wrong word in a comment

6189d90

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Demux refactoring without backend conversion #2339

Demux refactoring without backend conversion #2339

Menkib64 commented Nov 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Demux refactoring without backend conversion #2339

Are you sure you want to change the base?

Demux refactoring without backend conversion #2339

Conversation

Menkib64 commented Nov 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant