All guides

Avatar streaming guide

Build real-time avatar experiences with synchronized audio and facial animation.

Advanced20 min readSep 15, 2025
AvatarStreamingMultimodal
Key takeaways
  • Stream synchronized audio and blendshape frames over WebSocket.
  • Handle emotion parameters for expressive avatar responses.
  • Implement robust reconnection and buffering strategies.

Avatar streaming overview

The Disruptive Rain Avatar API generates synchronized audio and facial animation in real-time. It outputs ARKit-compatible blendshapes that can drive 3D avatar models.

Audio and animation frames stream together over a single WebSocket connection for tight synchronization.

  • Audio and blendshapes stream over the same WebSocket connection.
  • Frame synchronization uses timestamp-based alignment.
  • Emotion parameters control facial expressions.

WebSocket connection setup

Connect to the avatar streaming endpoint with session configuration. The server will stream audio chunks and blendshape frames in lockstep.

const ws = new WebSocket('wss://<gateway-host>/v1/avatar/stream');

ws.onopen = () => {
  ws.send(JSON.stringify({
    sessionId: 'avatar_' + Date.now(),
    voiceId: 'default',
    emotion: 'neutral',
    speed: 1.0,
  }));
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === 'audio') {
    audioQueue.push(msg.audio);
  } else if (msg.type === 'blendshape') {
    renderBlendshapes(msg.weights, msg.frameIndex);
  }
};

Emotion and expression control

Control avatar expressions by setting emotion parameters. The system interpolates between emotion states smoothly.

  • Supported emotions: neutral, happy, sad, angry, surprised, fearful.
  • Blend multiple emotions with weighted parameters.
  • Speech content automatically influences lip sync.

Latency optimization

Avatar streaming is latency-sensitive. Buffer a small number of frames and use adaptive playback to handle network jitter.

  • Target 100-200ms end-to-end latency for conversational UX.
  • Pre-buffer 3-5 frames before starting playback.
  • Implement frame skip logic for sustained network issues.