Skip to content

Research methods to reduce benchmark matrix size through extrapolation #6

@anfredette

Description

@anfredette

Benchmarking all combinations of (model, GPU, traffic profile) may result in an exponentially large matrix. We should investigate whether we can extrapolate missing combinations or build predictive models to reduce the number of explicit benchmarks needed.

Acceptance Criteria

  • Research feasibility of extrapolating between traffic profiles (e.g., interpolate TTFT for intermediate prompt lengths)
  • Investigate component-level benchmarking to build GPU+LLM performance models
  • Prototype an extrapolation or modeling approach
  • Validate accuracy against ground-truth benchmarks
  • Document approach and limitations

Notes

  • This is an exploratory/research task
  • Goal: Reduce benchmarking cost while maintaining recommendation accuracy
  • May inform Phase 2 architecture

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions