Skip to content

SkyBehind/Unicorn-Execution-Engine

Repository files navigation

Unicorn Execution Engine

Custom Runtime & Kernels for Hardware-Accelerated AI

License: MIT NPU Models Performance

πŸš€ Overview

The Unicorn Execution Engine is a high-performance runtime for deploying quantized AI models on specialized hardware accelerators, particularly AMD NPUs. Developed by Magic Unicorn Inc. as part of the Unicorn Commander Suite, this engine achieves unprecedented performance through custom MLIR-AIE2 kernels and INT8 quantization.

✨ Key Features

  • 220x Speedup on AMD Phoenix NPU vs CPU
  • Custom MLIR-AIE2 Kernels for optimal hardware utilization
  • INT8/INT4 Quantization with minimal accuracy loss
  • Zero-Copy Memory architecture
  • Async Pipeline Execution for maximum throughput
  • Python & C++ APIs for easy integration

πŸ“Š Performance Benchmarks

WhisperX Speech Recognition

Model CPU Time NPU Time Speedup Accuracy
Large-v3 59.4 min 16.2 sec 220x 99%
Large-v2 54.0 min 18.0 sec 180x 98%
Medium 27.0 min 14.4 sec 112x 95%

Benchmark: 1 hour of audio transcription

πŸ› οΈ Quick Start

Installation

# Clone the repository
git clone https://github.com/Unicorn-Commander/Unicorn-Execution-Engine
cd Unicorn-Execution-Engine

# Install dependencies
./install.sh

# Verify NPU availability
python -c "from unicorn_engine import NPUAccelerator; print(NPUAccelerator.is_available())"

Basic Usage

from unicorn_engine import NPUWhisperX

# Load quantized model from Hugging Face
model = NPUWhisperX.from_pretrained("magicunicorn/whisperx-large-v3-npu")

# Transcribe audio with hardware acceleration
result = model.transcribe("meeting.wav")
print(result["text"])

πŸ“¦ Pre-Quantized Models

We provide pre-quantized models optimized for NPU acceleration on Hugging Face:

πŸ—οΈ Architecture

Software Stack

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Python API Layer             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     Unicorn Execution Engine        β”‚
β”‚         (C++ Runtime)               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚      MLIR-AIE2 Compiler             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚       NPU Driver (Kernel)           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   AMD Phoenix NPU (16 TOPS INT8)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Custom Kernels

Our MLIR-AIE2 kernels are specifically optimized for the AMD AIE architecture:

// Example: Vectorized INT8 Attention
aie.core(%tile) {
  %q = aie.load_vector %query[%i] : vector<32xi8>
  %k = aie.load_vector %key[%i] : vector<32xi8>
  
  // 32 INT8 MAC operations in parallel
  %scores = aie.mac %q, %k : vector<32xi8> -> vector<32xi32>
  
  // Quantized softmax with lookup table
  %output = aie.lookup %scores, %exp_lut : vector<32xi8>
  
  aie.store_vector %output, %result[%i]
}

πŸ”§ Advanced Features

Pipeline Parallelism

# Process multiple audio streams concurrently
engine = UnicornEngine(pipeline_mode=True)

streams = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = await engine.process_batch(streams)

Custom Quantization

# Quantize your own models
from unicorn_engine import Quantizer

quantizer = Quantizer(target="npu", precision="int8")
quantized_model = quantizer.quantize(
    model="openai/whisper-large-v3",
    calibration_data="librispeech_100h"
)

Memory Optimization

# Zero-copy memory for large batches
with engine.zero_copy_mode():
    results = engine.process_large_batch(audio_files)

πŸ“š Documentation

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Setup development environment
./dev_setup.sh

# Run tests
pytest tests/

# Build documentation
make docs

πŸ“ˆ Roadmap

  • Support for additional NPU architectures (Qualcomm, MediaTek)
  • INT4 quantization for even smaller models
  • ONNX export for broader compatibility
  • Dynamic quantization based on hardware capabilities
  • Multi-NPU distribution for larger models

🏒 About Magic Unicorn Inc.

Magic Unicorn Inc. develops enterprise AI solutions optimized for edge deployment. The Unicorn Commander Suite provides complete AI infrastructure for on-premise deployments.

Other Projects

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • AMD for NPU hardware and MLIR-AIE2 support
  • OpenAI for the original Whisper models
  • The open-source community for testing and feedback

πŸ“ž Contact


Β© 2025 Magic Unicorn Inc. | Part of the Unicorn Commander Suite

⭐ Star us on GitHub if you find this useful!

About

Hardware-accelerated AI inference runtime with 220x speedup on AMD NPU

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published