Skip to content

Conversation

@CormickKneey
Copy link
Collaborator

Description

This PR adds terminal agent training support and fixes compatibility issues with the OpenAI Agents SDK.

Main Changes:

  1. New Terminal Agent Example: Implements a complete terminal task execution workflow using Terminal-Bench dataset

    • Agent workflow for executing terminal commands via OpenAI Agents SDK
    • MCP server for managing isolated Docker containers with tmux sessions
    • Dataset loader and converter for Terminal-Bench tasks
    • Complete setup guide and training example
  2. OpenAI Agents SDK Compatibility Fixes:

    • avoid possible omit type from sdk

Related Issue

Fixes #(issue)

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Bug fix (non-breaking change that fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have updated documentation if needed
  • My branch is up to date with main

Additional Context

Example Training Command:

export MCP_SERVER_URL=http://$MCP_SERVER_ADDRESS
python3 -m areal.launcher.local examples/openai-agents/train_agents.py \
    --config examples/openai-agents/config.yaml \
    actor.path=/storage/models/Qwen3-4B-Thinking-2507 \
    train_dataset.path=/tmp/terminal_bench/easy-data/train.parquet \
    agent_type=multi_agent_terminal

Experiment results with Qwen3-4B-Thinking-2507 included in assets/qwen3_4b_thinking_terminal.png.
(No so good, but can be reference)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @CormickKneey, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating a new terminal agent training pipeline. The core purpose is to allow AI agents to learn and perform complex tasks directly within a terminal environment, leveraging the OpenAI Agents SDK. This is achieved through a new multi-container orchestration (MCP) server that provides isolated terminal environments, specialized tools for agent interaction, and a structured workflow for reinforcement learning. The changes also include necessary adjustments to ensure seamless compatibility with the underlying OpenAI Agents SDK, making the entire system more robust and functional for terminal-based AI development.

Highlights

  • New Terminal Agent Training Support: Introduced a comprehensive framework for training terminal agents using the OpenAI Agents SDK, enabling agents to execute tasks within isolated Docker containers managed by an MCP server. This includes a new workflow, environment setup, and dataset handling for Terminal-Bench tasks.
  • OpenAI Agents SDK Compatibility Fixes: Resolved compatibility issues with the OpenAI Agents SDK by implementing a utility function to correctly handle NOT_GIVEN and Omit types, ensuring robust parameter passing in API calls.
  • Terminal-Bench Dataset Integration: Added support for the 'terminal_bench' dataset, including a new dataset loader and a converter script to transform Terminal-Bench tasks into a parquet format suitable for Reinforcement Learning (RL) training.
  • Comprehensive Documentation and Examples: Provided detailed guidance and examples for setting up the MCP server, preparing datasets, and initiating the training process for terminal agents, complete with example commands and an experiment record.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@fishcrap fishcrap requested a review from garrett4wade October 29, 2025 07:47
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: a terminal agent with its environment and training workflow, built on the OpenAI Agents SDK. The implementation is comprehensive, including a dedicated server for managing terminal environments, dataset converters, and the agent logic itself. The compatibility fixes for the OpenAI SDK are also a good improvement. My review focuses on improving correctness, maintainability, and efficiency in a few areas. Key suggestions include fixing a potential crash due to unhandled None rewards, resolving duplicate logging, improving dataset processing efficiency, and making logging paths consistent.

@CormickKneey
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive terminal agent training feature, including a new agent workflow, a dedicated MCP server for managing terminal environments, and data processing scripts for the Terminal-Bench dataset. It also includes compatibility fixes for the OpenAI Agents SDK. The implementation is extensive and well-documented. My review focuses on improving code maintainability, fixing some minor bugs in documentation and imports, and adhering to Python best practices. Key suggestions include refactoring a large server file, correcting import patterns, and improving logging practices.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions bot added the stale label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants