Skip to content

Conversation

@borg323
Copy link
Member

@borg323 borg323 commented Oct 16, 2025

Currently the onnxruntime coreml provider doesn't support everything required, the following three patches are needed for both fp32 and fp16 with fixed batch size (default for now).
microsoft/onnxruntime#26443 (merged)
microsoft/onnxruntime#26442
microsoft/onnxruntime#26462 (merged)

For variable batch size, hopefully the fix for issue microsoft/onnxruntime#26328 is simple.

If someone wants to try it out, the default onnxruntime branch should work. The last outstanding patch is for Gather fp16 support, which is the last kernel before the policy output, so doing it on the cpu shouldn't cause a huge performance drop.

@borg323 borg323 requested a review from Copilot October 16, 2025 09:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the CoreML execution provider to the ONNX backend, enabling hardware acceleration on Apple Silicon devices. The changes register a new "onnx-coreml" backend option and configure it to use the MLProgram model format with compute plan profiling.

  • Adds COREML as a new OnnxProvider enum value
  • Implements CoreML provider configuration with MLProgram format and profiling enabled
  • Registers the "onnx-coreml" backend with priority 59
  • Updates CI pipeline to build with ONNX runtime and test the CoreML backend on macOS ARM64

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/neural/backends/network_onnx.cc Adds CoreML provider enum, configuration logic, and backend registration
.circleci/config.yml Adds ONNX runtime installation, build configuration, and CoreML backend testing on macOS ARM64

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@borg323 borg323 force-pushed the onnx-coreml branch 8 times, most recently from 091e474 to 0f71b12 Compare October 17, 2025 17:20
@borg323
Copy link
Member Author

borg323 commented Nov 2, 2025

Some preliminary tests using lc0 bench with 791556 on a Apple M3 Pro.

fp32:

Total time (ms) : 5217
Nodes searched  : 13762
Nodes/second    : 2637

fp16

Total time (ms) : 5203
Nodes searched  : 20807
Nodes/second    : 3998

fp16 with PR26442 applied:

Total time (ms) : 5179
Nodes searched  : 26833
Nodes/second    : 5180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant