Doogfood docs for benchmark run

keugenek · keugenek · commit 497eb3667a0e · 2025-11-12T12:51:39.000Z
diff --git a/klaudbiusz/.env.example b/klaudbiusz/.env.example
@@ -2,6 +2,7 @@
 # Required for app generation and MLflow tracking
 DATABRICKS_HOST=https://your-workspace.databricks.com
 DATABRICKS_TOKEN=dapi...
+DATABRICKS_WAREHOUSE_ID=
 
 # Anthropic API
 # Required for Claude Agent SDK
@@ -10,5 +11,7 @@ ANTHROPIC_API_KEY=sk-ant-...
 # Optional: Database for logging
 # DATABASE_URL=postgresql://user:password@localhost:5432/dbname
 
+GEMINI_API_KEY=
+
 # MLFlow
 MLFLOW_EXPERIMENT_NAME=/Shared/klaudbiusz-evaluations
diff --git a/klaudbiusz/README.md b/klaudbiusz/README.md
@@ -20,18 +20,15 @@ cp .env.example .env
 # Edit .env with your credentials
 ```
 
-`.env` file contents:
-```bash
-DATABRICKS_HOST=https://your-workspace.databricks.com
-DATABRICKS_TOKEN=dapi...
-ANTHROPIC_API_KEY=sk-ant-...
-```
-
 ### Generate Applications
-
 ```bash
 cd klaudbiusz
 
+
+# make sure app folder is empty
+cli/archive_evaluation.sh
+cli/cleanup_evaluation.sh
+
 # Generate a single app (Claude backend, default)
 uv run cli/single_run.py "Create a customer churn analysis dashboard"
 
@@ -67,7 +64,7 @@ uv run cli/evaluate_all.py --skip 10 --limit 5          # Skip first 10, evaluat
 uv run cli/evaluate_app.py ../app/customer-churn-analysis
 ```
 
-**Results are automatically logged to MLflow:** Navigate to `ML → Experiments → /Shared/klaudbiusz-evaluations` in Databricks UI.
+**Results are automatically logged to MLflow:** Navigate to `ML → Experiments → /Shared/klaudbiusz-evaluations` in Databricks UI / Googfooding.
 
 ## Evaluation Framework