GitHub - GradientHQ/parallax: Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

Trusted by Partners

News

[2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
[2025/10] 🔥 Parallax version 0.0.1 has been released!

About

A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:

Host local LLM on personal devices
Cross-platform support
Pipeline parallel model sharding
Dynamic KV cache management & continuous batching for Mac
Dynamic request scheduling and routing for high performance

The backend architecture:

P2P communication powered by Lattica
GPU backend powered by SGLang
MAC backend powered by MLX LM

User Guide

Contributing

We warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.

Supported Models

	Provider	HuggingFace Collection	Blog	Description
DeepSeek	Deepseek	DeepSeek-V3.1 DeepSeek-R1 DeepSeek-V3 DeepSeek-V2	DeepSeek V3.1: The New Frontier in Artificial Intelligence	"DeepSeek" is an advanced large language model series from Deepseek AI, offering multiple generations such as DeepSeek-V3.1, DeepSeek-R1, DeepSeek-V2, and DeepSeek-V3. These models are designed for powerful natural language understanding and generation, with various sizes and capabilities for research and production use.
MiniMax-M2	MiniMax AI	MiniMax-M2	MiniMax M2 & Agent: Ingenious in Simplicity	MiniMax-M2 is a compact, fast, and cost-effective MoE model (230B parameters, 10B active) built for advanced coding and agentic workflows. It offers state-of-the-art intelligence and coding abilities, delivering efficient, reliable tool use and strong multi-step reasoning for developers and agents, with high throughput and low latency for easy deployment.
GLM-4.6	Z AI	GLM-4.6	GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities	GLM-4.6 improves upon GLM-4.5 with a longer 200K token context window, stronger coding and reasoning performance, enhanced tool-use and agent integration, and refined writing quality. Outperforms previous versions and is highly competitive with leading open-source models across coding, reasoning, and agent benchmarks.
Kimi-K2	Moonshot AI	Kimi-K2	Kimi K2: Open Agentic Intelligence	"Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Base, Kimi-K2-Instruct and Kimi-K2-Thinking. Kimi K2 Thinking is a state-of-the-art open-source agentic model designed for deep, step-by-step reasoning and dynamic tool use. It features native INT4 quantization and a 256k context window for fast, memory-efficient inference. Uniquely stable in long-horizon tasks, Kimi K2 enables reliable autonomous workflows with consistent performance across hundreds of tool calls.
Qwen	Qwen	Qwen3-Next Qwen3 Qwen2.5	Qwen3-Next: Towards Ultimate Training & Inference Efficiency	The Qwen series is a family of large language models developed by Alibaba's Qwen team. It includes multiple generations such as Qwen2.5, Qwen3, and Qwen3-Next, which improve upon model architecture, efficiency, and capabilities. The models are available in various sizes and instruction-tuned versions, with support for cutting-edge features like long context and quantization. Suitable for a wide range of language tasks and open-source use cases.
gpt-oss	OpenAI	gpt-oss gpt-oss-safeguard	Introducing gpt-oss-safeguard	gpt-oss are OpenAI’s open-weight GPT models (20B & 120B). The gpt-oss-safeguard variants are reasoning-based safety classification models: developers provide their own policy at inference, and the model uses chain-of-thought to classify content and explain its reasoning. This allows flexible, policy-driven moderation in complex or evolving domains, with open weights under Apache 2.0.
Meta Llama 3	Meta	Meta Llama 3 Llama 3.1 Llama 3.2 Llama 3.3	Introducing Meta Llama 3: The most capable openly available LLM to date	"Meta Llama 3" is Meta's third-generation Llama model, available in sizes such as 8B and 70B parameters. Includes instruction-tuned and quantized (e.g., FP8) variants.

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News

About

User Guide

Contributing

Supported Models

About

Uh oh!

Releases 2

Uh oh!

Contributors 15

Languages

License

GradientHQ/parallax

Folders and files

Latest commit

History

Repository files navigation

News

About

User Guide

Contributing

Supported Models

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors 15

Languages