add terminal agent with openai-agents #493

CormickKneey · 2025-10-29T07:44:55Z

Description

This PR adds terminal agent training support and fixes compatibility issues with the OpenAI Agents SDK.

Main Changes:

New Terminal Agent Example: Implements a complete terminal task execution workflow using Terminal-Bench dataset
- Agent workflow for executing terminal commands via OpenAI Agents SDK
- MCP server for managing isolated Docker containers with tmux sessions
- Dataset loader and converter for Terminal-Bench tasks
- Complete setup guide and training example
OpenAI Agents SDK Compatibility Fixes:
- avoid possible omit type from sdk

Related Issue

Fixes #(issue)

Type of Change

New feature (non-breaking change that adds functionality)
Bug fix (non-breaking change that fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have updated documentation if needed
My branch is up to date with main

Additional Context

Example Training Command:

export MCP_SERVER_URL=http://$MCP_SERVER_ADDRESS
python3 -m areal.launcher.local examples/openai-agents/train_agents.py \
    --config examples/openai-agents/config.yaml \
    actor.path=/storage/models/Qwen3-4B-Thinking-2507 \
    train_dataset.path=/tmp/terminal_bench/easy-data/train.parquet \
    agent_type=multi_agent_terminal

Experiment results with Qwen3-4B-Thinking-2507 included in assets/qwen3_4b_thinking_terminal.png.
（No so good, but can be reference）

gemini-code-assist · 2025-10-29T07:45:15Z

Summary of Changes

Hello @CormickKneey, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating a new terminal agent training pipeline. The core purpose is to allow AI agents to learn and perform complex tasks directly within a terminal environment, leveraging the OpenAI Agents SDK. This is achieved through a new multi-container orchestration (MCP) server that provides isolated terminal environments, specialized tools for agent interaction, and a structured workflow for reinforcement learning. The changes also include necessary adjustments to ensure seamless compatibility with the underlying OpenAI Agents SDK, making the entire system more robust and functional for terminal-based AI development.

Highlights

New Terminal Agent Training Support: Introduced a comprehensive framework for training terminal agents using the OpenAI Agents SDK, enabling agents to execute tasks within isolated Docker containers managed by an MCP server. This includes a new workflow, environment setup, and dataset handling for Terminal-Bench tasks.
OpenAI Agents SDK Compatibility Fixes: Resolved compatibility issues with the OpenAI Agents SDK by implementing a utility function to correctly handle NOT_GIVEN and Omit types, ensuring robust parameter passing in API calls.
Terminal-Bench Dataset Integration: Added support for the 'terminal_bench' dataset, including a new dataset loader and a converter script to transform Terminal-Bench tasks into a parquet format suitable for Reinforcement Learning (RL) training.
Comprehensive Documentation and Examples: Provided detailed guidance and examples for setting up the MCP server, preparing datasets, and initiating the training process for terminal agents, complete with example commands and an experiment record.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature: a terminal agent with its environment and training workflow, built on the OpenAI Agents SDK. The implementation is comprehensive, including a dedicated server for managing terminal environments, dataset converters, and the agent logic itself. The compatibility fixes for the OpenAI SDK are also a good improvement. My review focuses on improving correctness, maintainability, and efficiency in a few areas. Key suggestions include fixing a potential crash due to unhandled None rewards, resolving duplicate logging, improving dataset processing efficiency, and making logging paths consistent.

examples/openai-agents/agent_terminal_workflow.py

examples/openai-agents/terminal/logging_config.py

areal/dataset/terminal_bench.py

areal/experimental/openai/client.py

examples/openai-agents/terminal/README.md

examples/openai-agents/terminal/env.py

examples/openai-agents/terminal/server.py

CormickKneey · 2025-10-29T08:13:13Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive terminal agent training feature, including a new agent workflow, a dedicated MCP server for managing terminal environments, and data processing scripts for the Terminal-Bench dataset. It also includes compatibility fixes for the OpenAI Agents SDK. The implementation is extensive and well-documented. My review focuses on improving code maintainability, fixing some minor bugs in documentation and imports, and adhering to Python best practices. Key suggestions include refactoring a large server file, correcting import patterns, and improving logging practices.

examples/openai-agents/terminal/server.py

areal/dataset/terminal_bench.py

examples/openai-agents/agent_terminal_workflow.py

examples/openai-agents/terminal/README.md

examples/openai-agents/terminal/env.py

examples/openai-agents/terminal/server.py

examples/openai-agents/terminal/env.py

Signed-off-by: CormickKneey <[email protected]>

github-actions · 2025-11-29T01:16:28Z

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

CormickKneey had a problem deploying to AReaL-unittests October 29, 2025 07:44 — with GitHub Actions Failure

fishcrap requested a review from garrett4wade October 29, 2025 07:47

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

CormickKneey had a problem deploying to AReaL-unittests October 29, 2025 08:05 — with GitHub Actions Failure

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

CormickKneey had a problem deploying to AReaL-unittests October 29, 2025 08:43 — with GitHub Actions Failure

CormickKneey had a problem deploying to AReaL-unittests October 29, 2025 10:26 — with GitHub Actions Failure

CormickKneey force-pushed the tb-agent branch 3 times, most recently from aecf41f to 1e6c0f5 Compare October 30, 2025 12:27

fishcrap reviewed Oct 31, 2025

View reviewed changes

examples/openai-agents/terminal/env.py Outdated Show resolved Hide resolved

CormickKneey mentioned this pull request Oct 31, 2025

feat: extract tool output from openai-agents sdk #507

Merged

16 tasks

CormickKneey marked this pull request as draft October 31, 2025 05:44

CormickKneey force-pushed the tb-agent branch from cba52ee to 53b11ed Compare November 4, 2025 12:39

add terminal agent with openai-agents

ad21378

Signed-off-by: CormickKneey <[email protected]>

CormickKneey force-pushed the tb-agent branch from 53b11ed to ad21378 Compare November 4, 2025 12:49

github-actions bot added the stale label Nov 29, 2025

add terminal agent with openai-agents #493

Are you sure you want to change the base?

add terminal agent with openai-agents #493

Uh oh!

Conversation

CormickKneey commented Oct 29, 2025

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CormickKneey commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants