fix: executor/scheduler should be latest replica meta but not replica copy #45877

chyezh · 2025-11-26T12:35:56Z

… copy Signed-off-by: chyezh <[email protected]>

sre-ci-robot · 2025-11-26T12:36:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chyezh
To complete the pull request process, please assign jiaoew1991 after the PR has been reviewed.
You can assign the PR to them by writing /assign @jiaoew1991 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

internal/querycoordv2/task/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2025-11-26T12:36:40Z

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

/ci-rerun-code-check // for ci-v2/code-check
/ci-rerun-build // for ci-v2/build
/ci-rerun-ut-integration // for ci-v2/ut-integration
/ci-rerun-ut-go // for ci-v2/ut-go
/ci-rerun-ut-cpp // for ci-v2/ut-cpp
/ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
/ci-rerun-e2e-arm // for ci-v2/e2e-arm [master branch only]
/ci-rerun-e2e-default // for ci-v2/e2e-default [master branch only]

If you have any questions or requests, please contact @zhikunyao.

codecov · 2025-11-26T14:47:50Z

Codecov Report

❌ Patch coverage is 47.36842% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.18%. Comparing base (6c0a80d) to head (6687802).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/querycoordv2/task/executor.go	0.00%	7 Missing ⚠️
internal/querycoordv2/task/scheduler.go	75.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #45877      +/-   ##
==========================================
+ Coverage   76.13%   76.18%   +0.05%     
==========================================
  Files        1869     1881      +12     
  Lines      292340   294241    +1901     
==========================================
+ Hits       222576   224171    +1595     
- Misses      62378    62631     +253     
- Partials     7386     7439      +53

Components	Coverage Δ
Client	`78.17% <ø> (ø)`
Core	`82.75% <ø> (ø)`
Go	`74.30% <47.36%> (+0.03%)`	⬆️

Files with missing lines	Coverage Δ
internal/querycoordv2/task/scheduler.go	`72.81% <75.00%> (-0.65%)`	⬇️
internal/querycoordv2/task/executor.go	`60.61% <0.00%> (-0.55%)`	⬇️

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chyezh · 2025-11-26T15:47:01Z

/ci-rerun-ut-go

xiaofan-luan · 2025-11-26T18:53:14Z

internal/querycoordv2/task/executor.go

+		log.Warn("node doesn't belong to any replica", zap.Error(err))
+		return
+	}
+	view := ex.dist.ChannelDistManager.GetShardLeader(task.Shard(), replica)


do we need to verify replicaID is same as we execute?

xiaofan-luan · 2025-11-26T19:04:24Z

internal/querycoordv2/task/scheduler.go

-			leader := scheduler.distMgr.ChannelDistManager.GetShardLeader(task.Shard(), task.replica)
+			leader := scheduler.getReplicaShardLeader(task.Shard(), task.ReplicaID())
 			if leader == nil {
 				return merr.WrapErrServiceInternal("segment's delegator leader not found, stop balancing")


do we need to check the target node is still under the replica?

probably it's safe to lock and drain all the balance task before replica change? currently there seems to be many corner cases to handle?

if we don't want replica mutual exclusive to balance, it might at least to check the replica exist and node belongs to replica at preProcess?

xiaofan-luan · 2025-11-26T19:24:33Z

internal/querycoordv2/task/scheduler.go

 				// wait for new delegator becomes leader, then try to remove old leader
 				task := task.(*ChannelTask)
-				delegator := scheduler.distMgr.ChannelDistManager.GetShardLeader(task.Shard(), task.replica)
+				delegator := scheduler.getReplicaShardLeader(task.Shard(), task.ReplicaID())


is there a possibility that the node does not belong to the replica anymore?

fix: executor/scheduler should be latest replica meta but not replica…

6687802

… copy Signed-off-by: chyezh <[email protected]>

chyezh added this to the 2.6.7 milestone Nov 26, 2025

sre-ci-robot requested review from congqixia and jiaoew1991 November 26, 2025 12:36

sre-ci-robot added the size/S Denotes a PR that changes 10-29 lines. label Nov 26, 2025

mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Nov 26, 2025

chyezh mentioned this pull request Nov 26, 2025

fix: executor/scheduler should be latest replica meta but not replica copy #45878

Open

mergify bot added the ci-passed label Nov 26, 2025

xiaofan-luan reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: executor/scheduler should be latest replica meta but not replica copy #45877

fix: executor/scheduler should be latest replica meta but not replica copy #45877

chyezh commented Nov 26, 2025

Uh oh!

sre-ci-robot commented Nov 26, 2025

Uh oh!

sre-ci-robot commented Nov 26, 2025

Uh oh!

codecov bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

chyezh commented Nov 26, 2025

Uh oh!

xiaofan-luan Nov 26, 2025

Uh oh!

xiaofan-luan Nov 26, 2025

Uh oh!

xiaofan-luan Nov 26, 2025

Uh oh!

xiaofan-luan Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: executor/scheduler should be latest replica meta but not replica copy #45877

Are you sure you want to change the base?

fix: executor/scheduler should be latest replica meta but not replica copy #45877

Conversation

chyezh commented Nov 26, 2025

Uh oh!

sre-ci-robot commented Nov 26, 2025

Uh oh!

sre-ci-robot commented Nov 26, 2025

Uh oh!

codecov bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chyezh commented Nov 26, 2025

Uh oh!

xiaofan-luan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xiaofan-luan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xiaofan-luan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xiaofan-luan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 26, 2025 •

edited

Loading