@agent-infra/browser is dedicated to building a comprehensive browser infrastructure SDK specifically designed for AI Agents.
This toolkit is specifically designed for:
- GUI AI Agent that needs to interact with web browsers
- Browser screen casting in non-VNC or headless scenarios
- MCP service for browser automation control
Core Browser Control Library. Abstracts and encapsulates the fundamental capabilities required to manipulate browsers.
Browser Screen Casting UI Components. Can connect to remote browsers via CDP and then display their screen casting content.
Cross-Platform Browser Detection. Automatically locate installed browsers (Chrome, Edge, Firefox) on Windows, macOS, and Linux systems.
Smart Web Content Extraction. Extract clean, readable content from web pages and convert to Markdown format with advanced algorithms and browser automation support.
Media Processing Utilities. Media tools for handling browser-related tasks, such as high-performance base64 image parsing and media resource processing.
This is a monorepo managed with pnpm. To get started:
# Install dependencies
pnpm install
# Build all packages
pnpm run build
# Run tests
pnpm run test
# Lint code
pnpm run lint- Node.js >= 20.x
- pnpm for package management
- Chrome/Chromium browser for browser automation features
This project is licensed under the Apache License 2.0.
Special thanks to the open source projects that inspired this toolkit:
- puppeteer - The underlying browser automation library
- Chrome DevTools Protocol - Chrome DevTools Protocol
- ChatWise - Browser detection functionality reference
- readability - A standalone version of the readability lib
- edge-paths - Edge browser path finder
