Skip to content

Add RimeArcanaTTSService to Support Non-JSON WebSocket Streaming #3063

@gokuljs

Description

@gokuljs

Problem Statement

Pipecat’s current TTS services only support JSON-based WebSocket protocols. The Arcana (Rime) TTS model streams raw binary audio frames instead of JSON messages, so the existing TTS service infrastructure cannot process Arcana output. This prevents users from using Arcana for real-time speech generation within the Pipecat pipeline.

Proposed Solution

Proposed Solution

Implement a new RimeArcanaTTSService that adapts Pipecat’s TTS interface to Arcana’s non-JSON WebSocket protocol.

The service should:

  • Establish a WebSocket connection using Arcana’s query-parameter configuration.
  • Send plain text input (no JSON envelopes).
  • Receive binary audio frames and forward them directly to the audio pipeline.
  • Support Arcana control commands: <EOS>, <CLEAR>, <FLUSH>.
  • Implement the required lifecycle and utility methods:
    init, start, stop, _connect, _disconnect, _receive_messages, _get_websocket,
    run_tts, _handle_interruption, flush_audio, language_to_service_language,
    can_generate_metrics, _update_settings.
  • Reconnect when settings change (since model and voice parameters affect the WebSocket URL).

This enables full Arcana compatibility, consistent service behavior, and real-time speech synthesis support across Pipecat.

Alternative Solutions

No response

Additional Context

No response

Would you be willing to help implement this feature?

  • Yes, I'd like to contribute
  • No, I'm just suggesting

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions