Markdown view
# Avatar streaming guide
Build real-time avatar experiences with synchronized audio and facial animation.
- Date: Sep 15, 2025
- Reading time: 20 min
- Level: Advanced
- Tags: Avatar, Streaming, Multimodal
## Takeaways
- Stream synchronized audio and blendshape frames over WebSocket.
- Handle emotion parameters for expressive avatar responses.
- Implement robust reconnection and buffering strategies.
## Avatar streaming overview
The Disruptive Rain Avatar API generates synchronized audio and facial animation in real-time. It outputs ARKit-compatible blendshapes that can drive 3D avatar models.
Audio and animation frames stream together over a single WebSocket connection for tight synchronization.
- Audio and blendshapes stream over the same WebSocket connection.
- Frame synchronization uses timestamp-based alignment.
- Emotion parameters control facial expressions.
## WebSocket connection setup
Connect to the avatar streaming endpoint with session configuration. The server will stream audio chunks and blendshape frames in lockstep.
```ts
const ws = new WebSocket('wss://<gateway-host>/v1/avatar/stream');
ws.onopen = () => {
ws.send(JSON.stringify({
sessionId: 'avatar_' + Date.now(),
voiceId: 'default',
emotion: 'neutral',
speed: 1.0,
}));
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === 'audio') {
audioQueue.push(msg.audio);
} else if (msg.type === 'blendshape') {
renderBlendshapes(msg.weights, msg.frameIndex);
}
};
```
## Emotion and expression control
Control avatar expressions by setting emotion parameters. The system interpolates between emotion states smoothly.
- Supported emotions: neutral, happy, sad, angry, surprised, fearful.
- Blend multiple emotions with weighted parameters.
- Speech content automatically influences lip sync.
## Latency optimization
Avatar streaming is latency-sensitive. Buffer a small number of frames and use adaptive playback to handle network jitter.
- Target 100-200ms end-to-end latency for conversational UX.
- Pre-buffer 3-5 frames before starting playback.
- Implement frame skip logic for sustained network issues.