Lemony.ai

One cascade. Hundreds of specialists.

Documentation • Follow @SaschaBuehrle on X • Community

About Lemony

Lemony builds open, developer-focused AI infrastructure tools that make machine learning more efficient, transparent, and cost-effective.

Our mission is to help developers harness powerful AI while keeping costs predictable and accessible, while preparing for a future where hundreds of domain-specific small language models need to work safely together.

🚀 Featured Project

Cascadeflow

Smart AI model cascading for cost optimization.

Cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, Cascadeflow automatically escalates to flagship models if needed.

Cut Costs 30-65% in 3 Lines of Code. One cascade. Hundreds of specialists.

pip install cascadeflow

from cascadeflow import CascadeAgent, ModelConfig

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

result = await agent.run("What is TypeScript?")
# Automatically routes to optimal model, saves 40-85% on costs

Key Features:

⚡ Sub-2ms overhead - Negligible performance impact
💰 40-85% cost savings - Research-backed, production-proven
🔄 Mix any providers - OpenAI, Anthropic, Groq, Ollama, vLLM, Together
🧠 Domain understanding - Auto-detects code, medical, legal, math queries
🏭 Production ready - Streaming, tool calling, batch processing

Get Started → | Read Docs →

💡 Philosophy

Open Source First

All core infrastructure is open source. We believe the future of AI tooling is transparent, auditable, and community-driven.

Developer Experience

Zero vendor lock-in. Works with your existing models, providers, and architecture. Deploy anywhere.

Production Ready

Built for real workloads. Sub-2ms overhead, fault-tolerant, with comprehensive error handling.

Cost Transparency

Every query tracked. Built-in analytics. Programmable budget limits. No surprise bills.

📊 By The Numbers

40-85% average cost reduction in production
<2ms framework overhead
7+ supported AI providers (OpenAI, Anthropic, Groq, Ollama, vLLM, Together, HuggingFace)
100+ additional providers via LiteLLM integration
60-70% of queries handled by fast, efficient models

🌍 Community

We're building Lemony in public. Join our community:

GitHub Discussions - Ask questions, share ideas
X/Twitter - Latest updates and announcements
Issues - Bug reports and feature requests
Contributing - Help build the future of AI infrastructure

🤝 Contributing

We welcome contributions! Whether it's:

🐛 Bug reports and fixes
✨ Feature requests and implementations
📝 Documentation improvements
💡 New ideas and feedback

See our Contributing Guide to get started.

📬 Contact

Website: lemony.ai
Email: [email protected]
X/Twitter: @SaschaBuehrle
GitHub: @lemony-ai

📄 License

Our core projects are MIT licensed. See the Cascadeflow LICENSE for details.

Built with ❤️ by developers, for developers

Documentation • Follow @SaschaBuehrle on X • Community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lemony.ai

About Lemony

🚀 Featured Project

Cascadeflow

💡 Philosophy

Open Source First

Developer Experience

Production Ready

Cost Transparency

📊 By The Numbers

🌍 Community

🤝 Contributing

📬 Contact

📄 License

Pinned Loading

Repositories

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!