feat: evaluation code added #811

aliyanishfaq · 2025-08-27T22:34:51Z

Added ruff, mypy, and llm-as-a-judge eval for evaluating code written by openswe
Added sample dataset file
Added feature to pass .envs to daytona sandboxes for running code
fallbacks for running agent run script, made running openswe for eval optional

vercel · 2025-08-27T22:34:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
open-swe-web	Ready	Preview	Comment	Sep 9, 2025 0:34am
open-swe-web-langgraph	Ready	Preview	Comment	Sep 9, 2025 0:34am

bracesproul

lgtm with a couple nits

bracesproul · 2025-09-03T20:15:18Z

apps/open-swe/evals/utils/dataset.ts

can you upload this dataset to langsmith and pull it in instead of hard coding it?

apps/open-swe/evals/langgraph.eval.ts

bracesproul · 2025-09-03T20:18:31Z

apps/open-swe/evals/langgraph.eval.ts

+let programmerRunUrl =
+  "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/eba94921-7f40-4be0-b153-e88ab6fdcfdd/r/";


this should be an env var since it'll be different for everyone not in the LC org

i might also update it to be a createProgrammerRunURL func which uses two env vars:

LANGSMITH_WORKSPACE_ID

LANGSMITH_PROJECT_ID
for the first & second IDs, then it constructs a URL using them and the run id passes as an input arg

…pen-swe into aliyan/evals-code

…aliyan/evals-code

feat: evaluation code added

58b2651

vercel bot deployed to Preview – open-swe-web-langgraph August 27, 2025 22:36 View deployment

vercel bot deployed to Preview – open-swe-web August 27, 2025 22:36 View deployment

Merge branch 'main' into aliyan/evals-code

4901031

vercel bot deployed to Preview – open-swe-web August 28, 2025 17:38 View deployment

vercel bot deployed to Preview – open-swe-web-langgraph August 28, 2025 17:38 View deployment

bracesproul approved these changes Sep 3, 2025

View reviewed changes

aliyanishfaq added 3 commits September 8, 2025 17:28

chore: dataset removed & cod cleaning

79e8637

Merge branch 'aliyan/evals-code' of https://github.com/langchain-ai/o…

4b868ff

…pen-swe into aliyan/evals-code

Merge branch 'main' of https://github.com/langchain-ai/open-swe into …

ded1a88

…aliyan/evals-code

vercel bot deployed to Preview – open-swe-web-langgraph September 9, 2025 00:33 View deployment

vercel bot deployed to Preview – open-swe-web September 9, 2025 00:34 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: evaluation code added #811

feat: evaluation code added #811

Uh oh!

aliyanishfaq commented Aug 27, 2025

Uh oh!

vercel bot commented Aug 27, 2025 •

edited

Loading

Uh oh!

bracesproul left a comment

Uh oh!

bracesproul Sep 3, 2025

Uh oh!

aliyanishfaq Sep 9, 2025

Uh oh!

Uh oh!

bracesproul Sep 3, 2025

Uh oh!

aliyanishfaq Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		let programmerRunUrl =
		"https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/eba94921-7f40-4be0-b153-e88ab6fdcfdd/r/";

feat: evaluation code added #811

Are you sure you want to change the base?

feat: evaluation code added #811

Uh oh!

Conversation

aliyanishfaq commented Aug 27, 2025

Uh oh!

vercel bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bracesproul left a comment

Choose a reason for hiding this comment

Uh oh!

bracesproul Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

aliyanishfaq Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bracesproul Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

aliyanishfaq Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Aug 27, 2025 •

edited

Loading