Skip to content

Broken AMD OpenCL locality on systems with multiple PCI domains #696

@bgoglin

Description

@bgoglin

The OpenCL extension for querying locality of AMD GPU (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD) doesn't report the PCI domain. Systems with multiple PCI domains were rare, but at least systems with multiple MI300A CPU+GPUs (like Adastra and maybe El Capitan) use one domain per CPU now.

The issue is confirmed at ROCm/clr#106 but it won't be fixed upstream, likely because the OpenCL runtime doesn't matter anymore.

Possible solutions:

  • remove AMD OpenCL locality queries since it doesn't matter much anymore apparently
  • if there are multiple PCI domains in the machine, ignore AMD OpenCL locality (just attach to root)
  • if there are multiple AMD GPUs with same PCI BDFs except the PCI domain, attach all of them to root

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions