All guides

Realtime voice assistant

Stream audio in, synthesize audio out, and handle turn taking.

Intermediate16 min readMar 27, 2025
AudioRealtimeStreaming
Key takeaways
  • Stream audio in small, consistent chunks.
  • Maintain a session state for interruptions.
  • Use speech synthesis with low latency configs.

Audio ingestion

Use WebSocket streams for audio input and keep buffer sizes small for low latency.

Turn taking

Detect interruptions and decide when to cancel or resume speech output.

Latency budget

Measure every hop in the pipeline and keep total latency under your UX target.