-
Notifications
You must be signed in to change notification settings - Fork 691
feat: evaluation code added #811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
bracesproul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a couple nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you upload this dataset to langsmith and pull it in instead of hard coding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
| let programmerRunUrl = | ||
| "https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/projects/p/eba94921-7f40-4be0-b153-e88ab6fdcfdd/r/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an env var since it'll be different for everyone not in the LC org
i might also update it to be a createProgrammerRunURL func which uses two env vars:
LANGSMITH_WORKSPACE_IDLANGSMITH_PROJECT_ID
for the first & second IDs, then it constructs a URL using them and the run id passes as an input arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
…pen-swe into aliyan/evals-code
…aliyan/evals-code
ruff,mypy, andllm-as-a-judgeeval for evaluating code written by openswe.envsto daytona sandboxes for running code