Unicorn Execution Engine

Custom Runtime & Kernels for Hardware-Accelerated AI

🚀 Overview

The Unicorn Execution Engine is a high-performance runtime for deploying quantized AI models on specialized hardware accelerators, particularly AMD NPUs. Developed by Magic Unicorn Inc. as part of the Unicorn Commander Suite, this engine achieves unprecedented performance through custom MLIR-AIE2 kernels and INT8 quantization.

✨ Key Features

220x Speedup on AMD Phoenix NPU vs CPU
Custom MLIR-AIE2 Kernels for optimal hardware utilization
INT8/INT4 Quantization with minimal accuracy loss
Zero-Copy Memory architecture
Async Pipeline Execution for maximum throughput
Python & C++ APIs for easy integration

📊 Performance Benchmarks

WhisperX Speech Recognition

Model	CPU Time	NPU Time	Speedup	Accuracy
Large-v3	59.4 min	16.2 sec	220x	99%
Large-v2	54.0 min	18.0 sec	180x	98%
Medium	27.0 min	14.4 sec	112x	95%

Benchmark: 1 hour of audio transcription

🛠️ Quick Start

Installation

# Clone the repository
git clone https://github.com/Unicorn-Commander/Unicorn-Execution-Engine
cd Unicorn-Execution-Engine

# Install dependencies
./install.sh

# Verify NPU availability
python -c "from unicorn_engine import NPUAccelerator; print(NPUAccelerator.is_available())"

Basic Usage

from unicorn_engine import NPUWhisperX

# Load quantized model from Hugging Face
model = NPUWhisperX.from_pretrained("magicunicorn/whisperx-large-v3-npu")

# Transcribe audio with hardware acceleration
result = model.transcribe("meeting.wav")
print(result["text"])

📦 Pre-Quantized Models

We provide pre-quantized models optimized for NPU acceleration on Hugging Face:

whisperx-large-v3-npu - 99% accuracy, 220x speedup
whisperx-large-v2-npu - 98% accuracy, 180x speedup
whisperx-medium-npu - 95% accuracy, 112x speedup
whisperx-small-npu - 92% accuracy, 75x speedup
whisperx-base-npu - 88% accuracy, 56x speedup

🏗️ Architecture

Software Stack

┌─────────────────────────────────────┐
│        Python API Layer             │
├─────────────────────────────────────┤
│     Unicorn Execution Engine        │
│         (C++ Runtime)               │
├─────────────────────────────────────┤
│      MLIR-AIE2 Compiler             │
├─────────────────────────────────────┤
│       NPU Driver (Kernel)           │
├─────────────────────────────────────┤
│   AMD Phoenix NPU (16 TOPS INT8)    │
└─────────────────────────────────────┘

Custom Kernels

Our MLIR-AIE2 kernels are specifically optimized for the AMD AIE architecture:

// Example: Vectorized INT8 Attention
aie.core(%tile) {
  %q = aie.load_vector %query[%i] : vector<32xi8>
  %k = aie.load_vector %key[%i] : vector<32xi8>
  
  // 32 INT8 MAC operations in parallel
  %scores = aie.mac %q, %k : vector<32xi8> -> vector<32xi32>
  
  // Quantized softmax with lookup table
  %output = aie.lookup %scores, %exp_lut : vector<32xi8>
  
  aie.store_vector %output, %result[%i]
}

🔧 Advanced Features

Pipeline Parallelism

# Process multiple audio streams concurrently
engine = UnicornEngine(pipeline_mode=True)

streams = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = await engine.process_batch(streams)

Custom Quantization

# Quantize your own models
from unicorn_engine import Quantizer

quantizer = Quantizer(target="npu", precision="int8")
quantized_model = quantizer.quantize(
    model="openai/whisper-large-v3",
    calibration_data="librispeech_100h"
)

Memory Optimization

# Zero-copy memory for large batches
with engine.zero_copy_mode():
    results = engine.process_large_batch(audio_files)

📚 Documentation

Technical Documentation - Detailed architecture and implementation
API Reference - Complete API documentation
Kernel Development - Guide to writing custom MLIR kernels
Benchmarks - Comprehensive performance analysis

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Setup development environment
./dev_setup.sh

# Run tests
pytest tests/

# Build documentation
make docs

📈 Roadmap

Support for additional NPU architectures (Qualcomm, MediaTek)
INT4 quantization for even smaller models
ONNX export for broader compatibility
Dynamic quantization based on hardware capabilities
Multi-NPU distribution for larger models

🏢 About Magic Unicorn Inc.

Magic Unicorn Inc. develops enterprise AI solutions optimized for edge deployment. The Unicorn Commander Suite provides complete AI infrastructure for on-premise deployments.

Other Projects

Meeting-Ops - AI-powered meeting recording appliance
Unicorn Models - Pre-quantized models for edge deployment

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

AMD for NPU hardware and MLIR-AIE2 support
OpenAI for the original Whisper models
The open-source community for testing and feedback

📞 Contact

GitHub Issues: Report bugs or request features
Hugging Face: Discussion forum
Email: [email protected]

⭐ Star us on GitHub if you find this useful!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backends/amd_npu		backends/amd_npu
build/lib/unicorn_engine		build/lib/unicorn_engine
dist		dist
unicorn_engine.egg-info		unicorn_engine.egg-info
unicorn_engine		unicorn_engine
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
setup_simple.py		setup_simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unicorn Execution Engine

🚀 Overview

✨ Key Features

📊 Performance Benchmarks

WhisperX Speech Recognition

🛠️ Quick Start

Installation

Basic Usage

📦 Pre-Quantized Models

🏗️ Architecture

Software Stack

Custom Kernels

🔧 Advanced Features

Pipeline Parallelism

Custom Quantization

Memory Optimization

📚 Documentation

🤝 Contributing

Development Setup

📈 Roadmap

🏢 About Magic Unicorn Inc.

Other Projects

📄 License

🙏 Acknowledgments

📞 Contact

About

Uh oh!

Releases

Packages

Languages

License

SkyBehind/Unicorn-Execution-Engine

Folders and files

Latest commit

History

Repository files navigation

Unicorn Execution Engine

🚀 Overview

✨ Key Features

📊 Performance Benchmarks

WhisperX Speech Recognition

🛠️ Quick Start

Installation

Basic Usage

📦 Pre-Quantized Models

🏗️ Architecture

Software Stack

Custom Kernels

🔧 Advanced Features

Pipeline Parallelism

Custom Quantization

Memory Optimization

📚 Documentation

🤝 Contributing

Development Setup

📈 Roadmap

🏢 About Magic Unicorn Inc.

Other Projects

📄 License

🙏 Acknowledgments

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages