Skip to content
forked from sci-m-wang/OpenCE

An open-sourced implementation for "Agentic Context Engineering (ACE)" methon from *Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models* (arXiv:2510.04618).

Notifications You must be signed in to change notification settings

cheafon/ACE-open

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic Context Engineering (ACE) Reproduction Framework

This repository contains an implementation scaffold for reproducing the Agentic Context Engineering (ACE) method from Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models (arXiv:2510.04618).

The code follows the paper’s design:

  • Contexts are structured playbooks made of bullet entries with helpful/harmful counters.
  • Three agentic roles (Generator, Reflector, Curator) interact through incremental delta updates.
  • Offline and online adaptation loops support multi-epoch training and test-time continual learning.

Refer to docs/method_outline.md for a distilled summary of the methodology extracted from the paper.

Repository Layout

  • ace/: core library modules (playbook store, delta operations, roles, adaptation drivers, prompts, LLM abstractions).
  • tests/: lightweight regression tests using a dummy LLM and a toy environment.
  • docs/: engineering notes on the paper’s method.

Quick Start

  1. Ensure Python 3.9+ (development used 3.12). No third-party dependencies are required for the core scaffold.

  2. (Optional) Create a virtual environment and activate it.

  3. Run the unit tests:

    python -m unittest discover -s tests

Example Usage

Here is a minimal offline adaptation loop with the dummy LLM:

import json
from ace import (
    Playbook, DummyLLMClient, Generator, Reflector, Curator,
    OfflineAdapter, Sample, TaskEnvironment, EnvironmentResult
)

class ToyEnv(TaskEnvironment):
    def evaluate(self, sample, generator_output):
        gt = sample.ground_truth or ""
        pred = generator_output.final_answer
        feedback = "correct" if pred == gt else f"expected {gt} but got {pred}"
        return EnvironmentResult(feedback=feedback, ground_truth=gt)

client = DummyLLMClient()
client.queue(json.dumps({"reasoning": "...", "bullet_ids": [], "final_answer": "42"}))
client.queue(json.dumps({"reasoning": "...", "error_identification": "", "root_cause_analysis": "",
                         "correct_approach": "", "key_insight": "Remember 42.", "bullet_tags": []}))
client.queue(json.dumps({"reasoning": "...", "operations": [{"type": "ADD", "section": "defaults",
                         "content": "Answer 42 when in doubt.", "metadata": {"helpful": 1}}]}))

adapter = OfflineAdapter(
    playbook=Playbook(),
    generator=Generator(client),
    reflector=Reflector(client),
    curator=Curator(client),
)

samples = [Sample(question="Life?", ground_truth="42")]
adapter.run(samples, ToyEnv(), epochs=1)

Replace DummyLLMClient with a production LLM client (e.g., OpenAI, DeepSeek) and implement a task-specific TaskEnvironment to integrate real execution feedback from AppWorld or domain benchmarks.

Extending to Full Experiments

  • Implement an LLMClient subclass that wraps your chosen model API.

  • Provide task-specific prompts (see ace/prompts.py) or customize them per domain.

  • Build TaskEnvironment adapters that run the benchmark workflow (e.g., AppWorld ReAct agent, FiNER/Formula evaluation).

  • Configure offline (OfflineAdapter.run) and online (OnlineAdapter.run) loops with up to 5 epochs and reflector refinement rounds as reported in the paper.

  • Swap in a real LLM by using ace.TransformersLLMClient. For example, to use the local gpt-oss-20b weights on GPUs 2 and 3:

    CUDA_VISIBLE_DEVICES=2,3 python scripts/run_local_adapter.py

    (See the script in scripts/ for a minimal setup that wires the model into ACE.)

Note that this is not official implementation, just due to they do not release the official version, I repreduce it. Once the official version released, I will link to that.

About

An open-sourced implementation for "Agentic Context Engineering (ACE)" methon from *Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models* (arXiv:2510.04618).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%