Skip to content

[DocDB] Fix asan ash failures corresponding to YBTransaction class #29512

@basavaraj29

Description

@basavaraj29

Jira Link: DB-19319

Description

In a run with object locking enabled, we could see some PgSingleTServerTest* fail with the following

F20251124 23:10:34 ../../src/yb/ash/wait_state.cc:524]  In void yb::client::(anonymous namespace)::SingleLocalityPool::TransactionReady(const Status &, const YBTransactionPtr &, uint64_t) wait-state 0x5110001d30d8 was updated to kOnCpu_Active from kOnCpu_Active but it is currently kIdle. Not expecting concurrent updates.

{YB_SRC_ROOT}/src/yb/ash/wait_state.cc:524:     @     0x7f4c076798c9  yb::ash::ScopedWaitStatus::~ScopedWaitStatus()
${YB_SRC_ROOT}/src/yb/client/transaction_pool.cc:175:     @     0x7f4c0a3128b2  yb::client::(anonymous namespace)::SingleLocalityPool::TransactionReady(yb::Status const&, shared_ptr<yb::client::YBTransaction> const&, unsigned long)
${YB_THIRDPARTY_DIR}/installed/asan/include/boost/function/function_template.hpp:763:     @     0x7f4c0a27584e  boost::function1<void, yb::Status const&>::operator()(yb::Status const&) const
${YB_SRC_ROOT}/src/yb/client/transaction.cc:1952:     @     0x7f4c0a27584e  yb::client::YBTransaction::Impl::NotifyWaitersAndRelease(yb::UniqueLock<std::shared_mutex>*, yb::Status, char const*, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SetReady_Tag>)
${YB_SRC_ROOT}/src/yb/client/transaction.cc:1922:     @     0x7f4c0a275312  yb::client::YBTransaction::Impl::NotifyWaiters(yb::Status const&, char const*, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SetReady_Tag>)
${YB_SRC_ROOT}/src/yb/client/transaction.cc:2150:     @     0x7f4c0a27bc07  yb::client::YBTransaction::Impl::HeartbeatDone(yb::Status, yb::tserver::UpdateTransactionRequestPB const&, yb::tserver::UpdateTransactionResponsePB const&, yb::TransactionStatus, shared_ptr<yb::client::YBTransaction> const&, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SendHeartbeatToNewTablet_Tag>)
...

The debug failure is due to a buggy assert which doesn't expect concurrent state transitions of YBTransaction. But it is in fact possible.

TransactionManager::Impl::PickStatusTablet could invoke the callback inline, which executes YBTransaction::Impl::StatusTabletPicked, which does something like below

  void StatusTabletPicked(...) {
    ADOPT_WAIT_STATE(wait_state_);
    SCOPED_WAIT_STATUS(OnCpu_Active);
    // schedule lookup status tablet, which executes txn heartbeats down the line
    // and calls `NotifyWaiters` on first successful CREATED heartbeat
  }

and NotifyWaiters calls SingleLocalityPool::TransactionReady which is as follows

  void TransactionReady(...) {
    ADOPT_WAIT_STATE(txn->wait_state());
    SCOPED_WAIT_STATUS(OnCpu_Active);
    ...
  }

assuming thread T1 executing StatusTabletPicked didn't exit yet, and execution of TransactionReady gets started on T2, the fatal gets triggered on the thread that goes out of scope last now.

Concurrent updates are expected, and we should set it accordingly.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions