An MCP server for Text-to-Speech using MLX and Kokoro on macOS. The server keeps the Kokoro model loaded in RAM for low-latency speech synthesis.
- Fast TTS using Kokoro-82M model via MLX
- Model stays loaded in RAM for minimal latency
- Two MCP tools:
speakandlist_voices - 50+ Kokoro voices available (28 typically cached locally)
- Optional voice selection (defaults to af_heart)
- Only uses locally cached voices to avoid download delays
- Install dependencies:
uv sync- Install the package in development mode:
uv pip install -e .- Configure Claude Desktop to use the MCP server:
Edit your Claude Desktop configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Add the following to the mcpServers section:
{
"mcpServers": {
"mcp-tts-kokoro": {
"command": "[FILEPATH]/mcp-tts-mlx-audio/.venv/bin/python",
"args": [
"[FILEPATH]/mcp-tts-mlx-audio/mcp_server.py"
],
"env": {
"HF_HUB_CACHE": "/path/to/your/huggingface/cache"
}
}
}
}Important:
- Replace
[FILEPATH]with your actual filepath - Update the paths to match your installation directory
- Use absolute paths (not relative paths like
~or./)
Optional: If you use a custom HuggingFace cache location (via HF_HUB_CACHE environment variable), include it in the env section. Otherwise, you can omit the env section entirely.
Example complete config file:
{
"globalShortcut": "",
"mcpServers": {
"mcp-tts-kokoro": {
"command": "/Users/ianscrivener/_⭐️Code_2025_M4/mcp-tts-mlx-audio/.venv/bin/python",
"args": [
"/Users/ianscrivener/_⭐️Code_2025_M4/mcp-tts-mlx-audio/mcp_server.py"
],
"env": {
"HF_HUB_CACHE": "/Volumes/Crucial500Gb/HUGGINGFACE_HUB_ACTIVE"
}
}
}
}If you have other MCP servers already configured, just add the mcp-tts-kokoro entry to your existing mcpServers object.
- Restart Claude Desktop completely (quit and reopen) for the changes to take effect.
After restarting Claude Desktop, you should see the MCP server tools available:
- Open Claude Desktop
- Look for the tools icon or MCP indicator
- You should see two tools available:
speak- Convert text to speechlist_voices- List available voices
If you don't see the tools, check:
- Claude Desktop logs for errors (usually in
~/Library/Logs/Claude/on macOS) - Config file syntax - ensure valid JSON (no trailing commas, proper quotes)
- Paths are correct - use absolute paths, verify they exist
- Virtual environment exists at the specified path
- Python executable - run the command path manually to test:
/path/to/.venv/bin/python /path/to/mcp_server.py
Common issues:
- "No module named 'mcp'" - Run
uv syncin the project directory - "Model not found" - Ensure the Kokoro model has been downloaded (run the server once manually)
- Server crashes on startup - Check that all dependencies are installed with
uv sync
The MCP server runs automatically when Claude Desktop starts. It will load the Kokoro model into memory on first launch (this may take a few seconds).
To test the server manually outside of Claude Desktop:
source .venv/bin/activate && python mcp_server.pyNote: When running manually, you'll need to send MCP protocol messages. This is mainly useful for debugging.
Use the test_voices.py script to test all locally downloaded voices with custom text:
source .venv/bin/activate && python test_voices.py "Mary had a little lamb"This will iterate through all locally cached voices (typically 28), speaking:
- "Voice model A F heart. Mary had a little lamb"
- "Voice model A F nova. Mary had a little lamb"
- etc.
Note: The script only tests voices that have been downloaded to your local HuggingFace cache. It will skip any voices that aren't locally available. This is useful for finding your preferred voice without downloading all 50+ voices.
Convert text to speech and play it immediately.
Parameters:
text(required): The text to convert to speechvoice(optional): The Kokoro voice name (defaults to "af_heart")
Example usage:
{
"name": "speak",
"arguments": {
"text": "Hello world",
"voice": "af_heart"
}
}List all locally cached Kokoro voice models.
Parameters: None
Example usage:
{
"name": "list_voices",
"arguments": {}
}Returns a formatted list of locally downloaded voices (typically 28) including:
- American Female voices (af_*): nova, heart, bella, sarah, etc.
- American Male voices (am_*): adam, echo, michael, etc.
- British voices (bf_, bm_): alice, emma, george, etc.
Note: Only shows voices that have been downloaded to your local HuggingFace cache. The full Kokoro model includes 50+ voices, but they are downloaded on-demand.
- Model: mlx-community/Kokoro-82M-bf16
- Default Voice: af_heart
The server automatically detects the correct language from the voice name:
aprefix = American English (af_, am_)bprefix = British English (bf_, bm_)eprefix = Spanish (ef_, em_)fprefix = French (ff_, fm_)hprefix = Hindi (hf_, hm_)iprefix = Italian (if_, im_)jprefix = Japanese (jf_, jm_)pprefix = Portuguese (pf_, pm_)zprefix = Mandarin Chinese (zf_, zm_)
No need to specify language codes manually - the correct G2P pipeline is selected automatically based on the voice.
- The model is loaded once at startup and kept in RAM for low-latency inference
- To preserve RAM, simply quit Claude Desktop when not using the TTS feature
- Each language creates a separate pipeline on first use, cached for subsequent requests