-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Overview of the Issue
The semi-sync monitor was added in #17763. It tries to ensure that when there are no other writes happening, it will generate its own heartbeat writes in order to try and unblock semi-sync. Semi-sync is blocked when the last write is waiting for a semi-sync ACK indefinitely because at the time of that write on the primary there were no replicas available to ACK it. This can be unblocked by performing another write when there are subsequently replicas available to ACK (a later ACK unblocks any previous writes waiting for an ACK as the ACK of the later GTID means the earlier ones were also received by that replica).
The initial implementation has two major issues, however, both of which cause problems in write heavy and heavily loaded replica sets where the system as a whole can become overloaded (TCP retries, packet loss, etc) which causes the timing of events to become atypical due to these delays:
- When determining if semi-sync is blocked, it only looks to see if there are 1 or more connections/sessions waiting for a semi-sync ACK. This is wrong. It's not unusual to be waiting for an ACK for a brief period of time when the replicas are heavily loaded and lagging. Waiting != hung/blocked. It only means we're waiting 🙂. We need to see if there are still successful ACKs happening as that means that we're progressing and while we may have been / are waiting for an ACK in 1 or more sessions, we ARE making progress and thus we are NOT blocked.
- When we believe that semi-sync is blocked, we start sending writes — up to 15, each one in its own connection/session, each one doing an INSERT into the
_vt.semisync_heartbeattable — until one of them goes through. During this time we never re-check to see if semi-sync is unblocked. The monitor assumes that there are no other writes happening outside of the monitor (from the user/app) that DID make it through during this time. And because the system is likely already overloaded, these new writes can themselves be delayed, themselves block, and cause further issues.- The monitor itself could cause itself to become unable to function and cause the errant ERS in this way
- The primary has an unreliable network connection to the replicas for a period, this then causes the issue noted in point 1
- The first INSERT it does blocks waiting for an ACK it won't receive
- The "cleanup" ticker fires (every 24 hours) — which keeps the size of the
_vt.semisync_heartbeattable minimal — and the TRUNCATE is executed, which tries to get a table level lock and blocks behind the INSERT (so we never even attempt to commit and request an ACK) - All subsequent INSERT statements that the monitor runs to try and unblock things — remember, it's no longer checking to see if it was otherwise unblocked outside of the monitor and it will not check again until
--semi-sync-monitor-interval(defaults to 10s) is hit again — and these INSERTs block behind the TRUNCATE - We then hit the 15 writers limit and the
FullStatusvttabletRPC response tovtorcindicates that semi-sync is stuck and it should perform an ERS to unblock the shard - In the end, we end up telling
vtorcthat semi-sync is fully blocked and it needs to do an ERS — when in reality everything is fine (just experiencing a reasonable amount of delay in the system). The behavior in this scenario is worse than if we had no monitor at all.
- The monitor itself could cause itself to become unable to function and cause the errant ERS in this way
So today the monitor is ideal for replica sets with very low write rates, but potentially harmful for replica sets with high write rates.
Reproduction Steps
See code walkthrough.
Binary Version
vtgate version Version: 24.0.0-SNAPSHOT (Git revision 3d2cdc546a6ac223866e0287782e8b3912efe2ca branch 'improve_semi-sync_monitor') built on Fri Nov 7 05:42:51 UTC 2025 by [email protected] using go1.25.3 darwin/arm64Operating System and Environment details
N/A