HECM:Hollistic evaluation of Coding Models

HECM is a library meant to hollistically evaluate the agentic capbilities of coding LLMs. It consists of 2 primary features:

Mining data from Github issues to create agentic benchmarks for evaluating the agenting capabilities of coding models to solve problems.
An evaluation harness with a flexible API, designed to evaluate agents and models by executing corresponding testcases both in sandboxed and un-sandboxed manner.

Installation

git clone https://github.com/xynehq/hecm
cd hecm
uv pip install -r pyproject.toml --group dev

Usage

Generating Coding Agent evaluation data for a given repository

import os

from dotenv import load_dotenv

from hecm.dataset_generation import CodingAgentDataGenerator
from hecm.dataset_generation.utils import load_issues

load_dotenv()

analyzer = CodingAgentDataGenerator(
    repo_owner="juspay",
    repo_name="hyperswitch",
    github_token=os.getenv("GITHUB_TOKEN"),
    gold_patch_ignore_dirs=[
        ".github",
        ".devcontainer",
        "api-reference",
        "cypress-tests",
        "cypress-test-files",
        "docs",
    ],
    test_dirs=["cypress-tests", "cypress-test-files"],
)
issues = analyzer.generate_issues(
    save_to="data/issues/juspay___hyperswitch.json"
)
issues_with_linked_prs = analyzer.generate_linked_prs(
    issues, save_to="data/issues/juspay___hyperswitch.json"
)
data_points = analyzer.generate_data_points(issues_with_linked_prs)
data_points.export_to_huggingface(
    "juspay/hyperswitch", append_to_dataset=False
)

Running the evaluation harness

from hecm.eval_harness.agent import ClaudeCodeProxyAgent
from hecm.eval_harness.evaluation import Evaluator
from hecm.eval_harness.test_execution import JuspayHyperswitchLocalTestExecutor

evaluator = Evaluator(
    agent=ClaudeCodeProxyAgent(),
    executor=JuspayHyperswitchLocalTestExecutor(),
)
evaluator.evaluate(
    dataset="juspay/hyperswitch", # 🤗 address of the dataset
    split="train",
    max_data_points=8,
    result_save_path="results.json",
)

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
docs		docs
hecm		hecm
scripts		scripts
.gitignore		.gitignore
README.md		README.md
claude_attempts.json		claude_attempts.json
create_data.ipynb		create_data.ipynb
kill_containers.sh		kill_containers.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HECM:Hollistic evaluation of Coding Models

Installation

Usage

Generating Coding Agent evaluation data for a given repository

Running the evaluation harness

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

xynehq/hecm

Folders and files

Latest commit

History

Repository files navigation

HECM:Hollistic evaluation of Coding Models

Installation

Usage

Generating Coding Agent evaluation data for a given repository

Running the evaluation harness

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages