-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The model catalog should include GPU memory requirements for each model to support capacity planning and feasibility checks.
Acceptance Criteria
- Calculate or document GPU memory requirements for each model in the catalog
- Account for tensor parallelism configurations (e.g., memory per GPU for TP=2)
- Add
gpu_memory_gbfield todata/model_catalog.json - Update Recommendation Engine to filter out infeasible GPU types based on memory constraints
- Validate memory estimates against real deployments
Notes
- Memory requirements vary by model size, quantization, and tensor parallelism
- This prevents recommending invalid configurations (e.g., 70B model on single L4)
Metadata
Metadata
Assignees
Labels
No labels