Skip to content
@lemony-ai

Lemony.ai

Building the dev stack for AI cost optimization and transparency.
Lemony Logo

One cascade. Hundreds of specialists.

PyPI version npm version License: MIT Downloads Documentation X Follow GitHub Stars

DocumentationFollow @SaschaBuehrle on XCommunity


About Lemony

Lemony builds open, developer-focused AI infrastructure tools that make machine learning more efficient, transparent, and cost-effective.

Our mission is to help developers harness powerful AI while keeping costs predictable and accessible, while preparing for a future where hundreds of domain-specific small language models need to work safely together.


🚀 Featured Project

Smart AI model cascading for cost optimization.

Cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, Cascadeflow automatically escalates to flagship models if needed.

Cut Costs 30-65% in 3 Lines of Code. One cascade. Hundreds of specialists.

pip install cascadeflow
from cascadeflow import CascadeAgent, ModelConfig

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

result = await agent.run("What is TypeScript?")
# Automatically routes to optimal model, saves 40-85% on costs

Key Features:

  • Sub-2ms overhead - Negligible performance impact
  • 💰 40-85% cost savings - Research-backed, production-proven
  • 🔄 Mix any providers - OpenAI, Anthropic, Groq, Ollama, vLLM, Together
  • 🧠 Domain understanding - Auto-detects code, medical, legal, math queries
  • 🏭 Production ready - Streaming, tool calling, batch processing

Get Started → | Read Docs →


💡 Philosophy

Open Source First

All core infrastructure is open source. We believe the future of AI tooling is transparent, auditable, and community-driven.

Developer Experience

Zero vendor lock-in. Works with your existing models, providers, and architecture. Deploy anywhere.

Production Ready

Built for real workloads. Sub-2ms overhead, fault-tolerant, with comprehensive error handling.

Cost Transparency

Every query tracked. Built-in analytics. Programmable budget limits. No surprise bills.


📊 By The Numbers

  • 40-85% average cost reduction in production
  • <2ms framework overhead
  • 7+ supported AI providers (OpenAI, Anthropic, Groq, Ollama, vLLM, Together, HuggingFace)
  • 100+ additional providers via LiteLLM integration
  • 60-70% of queries handled by fast, efficient models

🌍 Community

We're building Lemony in public. Join our community:


🤝 Contributing

We welcome contributions! Whether it's:

  • 🐛 Bug reports and fixes
  • ✨ Feature requests and implementations
  • 📝 Documentation improvements
  • 💡 New ideas and feedback

See our Contributing Guide to get started.


📬 Contact


📄 License

Our core projects are MIT licensed. See the Cascadeflow LICENSE for details.


Built with ❤️ by developers, for developers

DocumentationFollow @SaschaBuehrle on XCommunity

GitHub Stars

Pinned Loading

  1. cascadeflow cascadeflow Public

    Smart AI model cascading for cost optimization

    Python 212 90

Repositories

Showing 2 of 2 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…