Skip to content

QWQ-32B 出错 #350

@csy0225

Description

@csy0225

I20251111 12:28:58.490254 215011 worker_server.cpp:72] Create worker server with device: 0
F20251111 12:28:58.504206 214989 collective_service.cpp:37] Check failed: status == HCCL_SUCCESS (1 vs. 0) HCCL get root info failed.
*** Check failure stack trace: ***
@ 0x7ccb7f google::LogMessage::SendToLog()
@ 0x7c94e2 google::LogMessage::Flush()
@ 0x7cd289 google::LogMessageFatal::~LogMessageFatal()
@ 0x29206c4 xllm::CollectiveService::CollectiveService()
@ 0xd2502c xllm::DistManager::setup_multi_node_workers()
@ 0xd25d62 xllm::DistManager::DistManager()
@ 0xc58c6f xllm::LLMEngine::setup_workers()
@ 0xc5a892 xllm::LLMEngine::LLMEngine()
@ 0x793791 xllm::Master::Master()
@ 0x782d1c xllm::LLMMaster::LLMMaster()
@ 0x792e48 xllm::create_master()
@ 0x7699bd run()
@ 0x645030 main
@ 0x7f64d9e4da10 (unknown)
@ 0x7f64d9e4dac9 __libc_start_main
@ 0x753e95 _start
@ (nil) (unknown)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions