Skip to content

ianscrivener/mcp-tts-mlx-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mcp-tts-mlx-audio

An MCP server for Text-to-Speech using MLX and Kokoro on macOS. The server keeps the Kokoro model loaded in RAM for low-latency speech synthesis.

Features

  • Fast TTS using Kokoro-82M model via MLX
  • Model stays loaded in RAM for minimal latency
  • Two MCP tools: speak and list_voices
  • 50+ Kokoro voices available (28 typically cached locally)
  • Optional voice selection (defaults to af_heart)
  • Only uses locally cached voices to avoid download delays

Installation

  1. Install dependencies:
uv sync
  1. Install the package in development mode:
uv pip install -e .
  1. Configure Claude Desktop to use the MCP server:

Edit your Claude Desktop configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Add the following to the mcpServers section:

{
  "mcpServers": {
    "mcp-tts-kokoro": {
      "command": "[FILEPATH]/mcp-tts-mlx-audio/.venv/bin/python",
      "args": [
        "[FILEPATH]/mcp-tts-mlx-audio/mcp_server.py"
      ],
      "env": {
        "HF_HUB_CACHE": "/path/to/your/huggingface/cache"
      }
    }
  }
}

Important:

  • Replace [FILEPATH] with your actual filepath
  • Update the paths to match your installation directory
  • Use absolute paths (not relative paths like ~ or ./)

Optional: If you use a custom HuggingFace cache location (via HF_HUB_CACHE environment variable), include it in the env section. Otherwise, you can omit the env section entirely.

Example complete config file:

{
  "globalShortcut": "",
  "mcpServers": {
    "mcp-tts-kokoro": {
      "command": "/Users/ianscrivener/_⭐️Code_2025_M4/mcp-tts-mlx-audio/.venv/bin/python",
      "args": [
        "/Users/ianscrivener/_⭐️Code_2025_M4/mcp-tts-mlx-audio/mcp_server.py"
      ],
      "env": {
        "HF_HUB_CACHE": "/Volumes/Crucial500Gb/HUGGINGFACE_HUB_ACTIVE"
      }
    }
  }
}

If you have other MCP servers already configured, just add the mcp-tts-kokoro entry to your existing mcpServers object.

  1. Restart Claude Desktop completely (quit and reopen) for the changes to take effect.

Verifying Installation

After restarting Claude Desktop, you should see the MCP server tools available:

  1. Open Claude Desktop
  2. Look for the tools icon or MCP indicator
  3. You should see two tools available:
    • speak - Convert text to speech
    • list_voices - List available voices

If you don't see the tools, check:

  • Claude Desktop logs for errors (usually in ~/Library/Logs/Claude/ on macOS)
  • Config file syntax - ensure valid JSON (no trailing commas, proper quotes)
  • Paths are correct - use absolute paths, verify they exist
  • Virtual environment exists at the specified path
  • Python executable - run the command path manually to test:
    /path/to/.venv/bin/python /path/to/mcp_server.py

Common issues:

  • "No module named 'mcp'" - Run uv sync in the project directory
  • "Model not found" - Ensure the Kokoro model has been downloaded (run the server once manually)
  • Server crashes on startup - Check that all dependencies are installed with uv sync

Usage

MCP Server

The MCP server runs automatically when Claude Desktop starts. It will load the Kokoro model into memory on first launch (this may take a few seconds).

To test the server manually outside of Claude Desktop:

source .venv/bin/activate && python mcp_server.py

Note: When running manually, you'll need to send MCP protocol messages. This is mainly useful for debugging.

Testing All Voices

Use the test_voices.py script to test all locally downloaded voices with custom text:

source .venv/bin/activate && python test_voices.py "Mary had a little lamb"

This will iterate through all locally cached voices (typically 28), speaking:

  • "Voice model A F heart. Mary had a little lamb"
  • "Voice model A F nova. Mary had a little lamb"
  • etc.

Note: The script only tests voices that have been downloaded to your local HuggingFace cache. It will skip any voices that aren't locally available. This is useful for finding your preferred voice without downloading all 50+ voices.

MCP Tools

speak

Convert text to speech and play it immediately.

Parameters:

  • text (required): The text to convert to speech
  • voice (optional): The Kokoro voice name (defaults to "af_heart")

Example usage:

{
  "name": "speak",
  "arguments": {
    "text": "Hello world",
    "voice": "af_heart"
  }
}

list_voices

List all locally cached Kokoro voice models.

Parameters: None

Example usage:

{
  "name": "list_voices",
  "arguments": {}
}

Returns a formatted list of locally downloaded voices (typically 28) including:

  • American Female voices (af_*): nova, heart, bella, sarah, etc.
  • American Male voices (am_*): adam, echo, michael, etc.
  • British voices (bf_, bm_): alice, emma, george, etc.

Note: Only shows voices that have been downloaded to your local HuggingFace cache. The full Kokoro model includes 50+ voices, but they are downloaded on-demand.

Configuration

  • Model: mlx-community/Kokoro-82M-bf16
  • Default Voice: af_heart

Language Support

The server automatically detects the correct language from the voice name:

  • a prefix = American English (af_, am_)
  • b prefix = British English (bf_, bm_)
  • e prefix = Spanish (ef_, em_)
  • f prefix = French (ff_, fm_)
  • h prefix = Hindi (hf_, hm_)
  • i prefix = Italian (if_, im_)
  • j prefix = Japanese (jf_, jm_)
  • p prefix = Portuguese (pf_, pm_)
  • z prefix = Mandarin Chinese (zf_, zm_)

No need to specify language codes manually - the correct G2P pipeline is selected automatically based on the voice.

Notes

  • The model is loaded once at startup and kept in RAM for low-latency inference
  • To preserve RAM, simply quit Claude Desktop when not using the TTS feature
  • Each language creates a separate pipeline on first use, cached for subsequent requests

About

An TTS MCP server for MacOS leveraging MLX and mlx-audio

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages