Skip to content

Evaluate and potentially upgrade intent extraction LLM #10

@anfredette

Description

@anfredette

The Intent & Specification Engine currently uses Ollama llama3.1:8b for intent extraction. We should evaluate whether there are better options for accuracy, cost, or flexibility.

Acceptance Criteria

  • Benchmark current llama3.1:8b accuracy on intent extraction tasks
  • Evaluate alternative models (e.g., larger Llama models, Claude, GPT-4)
  • Test with diverse user input examples
  • Measure accuracy, latency, and cost tradeoffs
  • Document recommendation and rationale
  • (Optional) Implement configurable LLM backend to support external APIs (OpenAI, Anthropic)

Notes

  • Intent extraction quality directly impacts recommendation accuracy
  • Consider both local (Ollama) and API-based options
  • May want to make LLM configurable for different deployment environments

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions