Lemony builds open, developer-focused AI infrastructure tools that make machine learning more efficient, transparent, and cost-effective.
Our mission is to help developers harness powerful AI while keeping costs predictable and accessible, while preparing for a future where hundreds of domain-specific small language models need to work safely together.
Smart AI model cascading for cost optimization.
Cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, Cascadeflow automatically escalates to flagship models if needed.
Cut Costs 30-65% in 3 Lines of Code. One cascade. Hundreds of specialists.
pip install cascadeflowfrom cascadeflow import CascadeAgent, ModelConfig
agent = CascadeAgent(models=[
ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),
ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])
result = await agent.run("What is TypeScript?")
# Automatically routes to optimal model, saves 40-85% on costsKey Features:
- ⚡ Sub-2ms overhead - Negligible performance impact
- 💰 40-85% cost savings - Research-backed, production-proven
- 🔄 Mix any providers - OpenAI, Anthropic, Groq, Ollama, vLLM, Together
- 🧠 Domain understanding - Auto-detects code, medical, legal, math queries
- 🏭 Production ready - Streaming, tool calling, batch processing
All core infrastructure is open source. We believe the future of AI tooling is transparent, auditable, and community-driven.
Zero vendor lock-in. Works with your existing models, providers, and architecture. Deploy anywhere.
Built for real workloads. Sub-2ms overhead, fault-tolerant, with comprehensive error handling.
Every query tracked. Built-in analytics. Programmable budget limits. No surprise bills.
- 40-85% average cost reduction in production
- <2ms framework overhead
- 7+ supported AI providers (OpenAI, Anthropic, Groq, Ollama, vLLM, Together, HuggingFace)
- 100+ additional providers via LiteLLM integration
- 60-70% of queries handled by fast, efficient models
We're building Lemony in public. Join our community:
- GitHub Discussions - Ask questions, share ideas
- X/Twitter - Latest updates and announcements
- Issues - Bug reports and feature requests
- Contributing - Help build the future of AI infrastructure
We welcome contributions! Whether it's:
- 🐛 Bug reports and fixes
- ✨ Feature requests and implementations
- 📝 Documentation improvements
- 💡 New ideas and feedback
See our Contributing Guide to get started.
- Website: lemony.ai
- Email: [email protected]
- X/Twitter: @SaschaBuehrle
- GitHub: @lemony-ai
Our core projects are MIT licensed. See the Cascadeflow LICENSE for details.