Audio Streaming Guide

Real-time bidirectional audio streaming enables Voice AI applications, live transcription, voice assistants, and custom audio processing on Plivo calls.

Prerequisites

1. Plivo Account

Credential	Where to Find
Auth ID	Plivo Console
Auth Token	Plivo Console

2. Phone Number

You need a voice-enabled Plivo number to make or receive calls.

Call Type	Number Requirement
Inbound	Callers dial your Plivo number, triggers your Answer URL, starts stream
Outbound	Your Plivo number is the Caller ID when making calls via API

Get a number:

Go to Phone Numbers > Buy Numbers
Select country and type (local, toll-free, mobile)
Filter by voice_enabled = true
Purchase

India Numbers (Additional Requirements)

Indian phone numbers require KYC compliance:

Requirement	Details
Account currency	Must be INR
KYC documents	Certificate of Incorporation (COI) + GST Certificate
Business registration	India-registered businesses only

Submit compliance at Compliance Application before purchasing. See Rent India Numbers for details.

3. WebSocket Server

Your server must:

Accept WebSocket connections over wss://
Be publicly accessible (use ngrok for local development)
Handle Plivo’s stream events (start, media, dtmf, stop)

4. AI Service Credentials (Optional)

For voice AI applications, you’ll typically need:

Speech-to-Text: Deepgram, Google Speech, AWS Transcribe
LLM: OpenAI, Anthropic, Google Gemini
Text-to-Speech: ElevenLabs, Google TTS, Amazon Polly

How It Works

Plivo streams real-time audio between phone calls and your WebSocket server.

Phone Call <-> Plivo <-> WebSocket <-> Your Server <-> AI Services

Architecture

Step-by-Step Flow

Call Initiation: A caller dials your Plivo number, or your application initiates an outbound call.
Answer URL Request: Plivo makes an HTTP request to your configured Answer URL.
Stream XML Response: Your server responds with XML containing the <Stream> element, specifying the WebSocket URL and streaming parameters.
WebSocket Connection: Plivo establishes a WebSocket connection to your specified URL.
Start Event: Plivo sends a start event containing call metadata (call ID, stream ID, media format).
Media Streaming:
- Inbound: Plivo continuously sends media events containing base64-encoded audio chunks from the caller.
- Outbound: Your server sends playAudio events with base64-encoded audio to be played to the caller.
DTMF Events: When the caller presses keys, Plivo sends dtmf events with the digit information.
Control Events: Your server can send clearAudio to interrupt playback or checkpoint to track playback progress.
Connection Close: When the call ends or streaming stops, the WebSocket connection closes.

Stream XML

The <Stream> XML element initiates audio streaming for a call. Include it in your Answer URL response.

Basic Syntax

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">
        wss://your-server.com/stream
    </Stream>
</Response>

Parameters

Parameter	Type	Default	Description
`bidirectional`	boolean	`false`	Enable two-way audio streaming. When `true`, you can send audio back to the caller.
`keepCallAlive`	boolean	`false`	Keep the call active after the stream ends. When `false`, the call ends when streaming stops.
`contentType`	string	`audio/x-mulaw;rate=8000`	Audio codec and sample rate. See Supported Content Types.
`statusCallbackUrl`	string	—	URL for stream status callbacks (started, stopped, failed).
`statusCallbackMethod`	string	`POST`	HTTP method for status callbacks (`GET` or `POST`).
`extraHeaders`	string	—	Custom headers to include in the start event. Format: `key1=value1;key2=value2`

Supported Content Types

Content Type	Description	Use Case
`audio/x-mulaw;rate=8000`	mu-law codec at 8kHz	Recommended. Standard telephony, lowest latency, best compatibility.
`audio/x-l16;rate=8000`	Linear PCM 16-bit at 8kHz	Higher quality for speech processing.
`audio/x-l16;rate=16000`	Linear PCM 16-bit at 16kHz	High-quality speech recognition.

Examples

Bidirectional Stream with mu-law Codec

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak>Hello! I'm connecting you to our AI assistant.</Speak>
    <Stream bidirectional="true"
            keepCallAlive="true"
            contentType="audio/x-mulaw;rate=8000">
        wss://your-server.com/stream
    </Stream>
</Response>

Stream with Status Callbacks and Extra Headers

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Stream bidirectional="true"
            keepCallAlive="true"
            contentType="audio/x-mulaw;rate=8000"
            statusCallbackUrl="https://your-server.com/stream-status"
            statusCallbackMethod="POST"
            extraHeaders="userId=12345;sessionId=abc-xyz">
        wss://your-server.com/stream
    </Stream>
</Response>

Stream APIs

Control active streams programmatically via REST API calls.

Base URL

https://api.plivo.com/v1/Account/{auth_id}/Call/{call_uuid}/Stream/

Authentication

Use HTTP Basic Authentication with your Plivo Auth ID and Auth Token.

Stop a Stream

Endpoint: DELETE /v1/Account/{auth_id}/Call/{call_uuid}/Stream/

curl -X DELETE \
  https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
  -u YOUR_AUTH_ID:YOUR_AUTH_TOKEN

Get Stream Details

Endpoint: GET /v1/Account/{auth_id}/Call/{call_uuid}/Stream/

curl -X GET \
  https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
  -u YOUR_AUTH_ID:YOUR_AUTH_TOKEN

Using the Plivo SDK

Node.js

const plivo = require('plivo');
const client = new plivo.Client('YOUR_AUTH_ID', 'YOUR_AUTH_TOKEN');

// Stop a stream
await client.calls.stopStream('CALL_UUID');

Python

import plivo

client = plivo.RestClient('YOUR_AUTH_ID', 'YOUR_AUTH_TOKEN')

# Stop a stream
client.calls.stop_stream(call_uuid='CALL_UUID')

Stream Status Callbacks

Configure a callback URL to receive notifications about stream lifecycle events.

Configuration

<Stream bidirectional="true"
        statusCallbackUrl="https://your-server.com/stream-status"
        statusCallbackMethod="POST">
    wss://your-server.com/stream
</Stream>

Callback Parameters

Parameter	Type	Description
`CallUUID`	string	The unique identifier for the call
`StreamID`	string	The unique identifier for the stream
`Event`	string	The event type: `started`, `stopped`, `failed`
`Timestamp`	string	ISO 8601 timestamp of the event
`From`	string	The caller’s phone number
`To`	string	The called phone number
`Direction`	string	Call direction: `inbound` or `outbound`
`StatusReason`	string	Reason for status (on `stopped` or `failed`)
`Duration`	number	Stream duration in seconds (on `stopped`)

Example Handler

app.post('/stream-status', (req, res) => {
  const { CallUUID, StreamID, Event, StatusReason, Duration } = req.body;

  switch (Event) {
    case 'started':
      console.log(`Stream ${StreamID} started for call ${CallUUID}`);
      break;
    case 'stopped':
      console.log(`Stream ${StreamID} stopped after ${Duration}s: ${StatusReason}`);
      break;
    case 'failed':
      console.error(`Stream ${StreamID} failed: ${StatusReason}`);
      break;
  }

  res.sendStatus(200);
});

Signature Validation

Plivo signs WebSocket connection requests to verify authenticity. Validate these signatures to ensure requests originate from Plivo.

V3 Signature Headers

Header	Description
`X-Plivo-Signature-V3`	The HMAC-SHA256 signature
`X-Plivo-Signature-V3-Nonce`	A unique nonce for this request

Using the Plivo SDK

import { validateV3Signature } from 'plivo';

const isValid = validateV3Signature(
  method,     // 'GET' for WebSocket upgrade requests
  uri,        // Full URI including protocol and path
  nonce,      // X-Plivo-Signature-V3-Nonce header value
  authToken,  // Your Plivo Auth Token
  signature,  // X-Plivo-Signature-V3 header value
);

Using the Node.js Stream SDK

The plivo-stream-sdk-node handles signature validation automatically:

const plivoServer = new PlivoWebSocketServer({
  server,
  path: '/stream',
  validateSignature: true,
  authToken: process.env.PLIVO_AUTH_TOKEN,
});

When validateSignature is enabled, connections with invalid signatures are automatically rejected with a 1008 WebSocket close code.

WebSocket Events

All communication over the WebSocket uses JSON messages. Here are the essential events you need to handle.

Events from Plivo (Input)

Event	Description
`start`	Sent once when stream begins. Contains call metadata (callId, streamId, mediaFormat).
`media`	Sent continuously. Contains base64-encoded audio chunks (~20ms each).
`dtmf`	Sent when caller presses keys. Contains the digit pressed.
`playedStream`	Confirmation that audio with a checkpoint finished playing.
`clearedAudio`	Confirmation that the audio queue was cleared.

Events to Plivo (Output)

Event	Description
`playAudio`	Send audio to the caller. Include base64 payload matching stream contentType.
`checkpoint`	Mark a point in audio queue. Receive `playedStream` when reached.
`clearAudio`	Clear all queued audio. Use for interruption handling.

Quick Example

// Handle incoming events
ws.on('message', (data) => {
  const event = JSON.parse(data);

  switch (event.event) {
    case 'start':
      console.log('Stream started:', event.start.streamId);
      break;
    case 'media':
      // Forward audio to STT service
      const audio = Buffer.from(event.media.payload, 'base64');
      sttClient.send(audio);
      break;
    case 'dtmf':
      console.log('DTMF pressed:', event.dtmf.digit);
      break;
  }
});

// Send audio to caller
ws.send(JSON.stringify({
  event: 'playAudio',
  media: {
    contentType: 'audio/x-mulaw',
    sampleRate: 8000,
    payload: base64EncodedAudio
  }
}));

For complete event schemas, TypeScript types, and detailed field documentation, see the Audio Streaming Protocol Reference.

X-Headers

Pass custom metadata from your Stream XML to your WebSocket server.

Usage

<Stream bidirectional="true"
        extraHeaders="userId=12345;sessionId=abc-xyz;tier=premium">
    wss://your-server.com/stream
</Stream>

Parsing

function parseExtraHeaders(extraHeaders) {
  const headers = {};
  if (!extraHeaders) return headers;

  for (const pair of extraHeaders.split(';')) {
    const [key, value] = pair.split('=');
    if (key && value) {
      headers[key.trim()] = decodeURIComponent(value.trim());
    }
  }
  return headers;
}

// Usage
const headers = parseExtraHeaders(event.extra_headers);
console.log(headers.userId);     // "12345"
console.log(headers.sessionId);  // "abc-xyz"

Limits

WebSocket and Stream Limits

Limit	Value
Maximum WebSocket URL length	2048 characters
Maximum concurrent streams per call	1
Maximum stream duration	Same as call duration
Audio buffer size (playback queue)	~60 seconds of audio
Maximum WebSocket message size	64 KB
Recommended audio chunk size	16 KB base64-encoded or less

Best Practices

Use mu-law 8000Hz

Why mu-law at 8kHz is recommended:

Native Telephony Format: No transcoding required, lowest latency
Bandwidth Efficient: Compresses 16-bit audio to 8-bit while maintaining voice quality
Universal Compatibility: Every STT/TTS service supports mu-law
Sufficient for Voice: Human speech is well-represented at 8kHz

<!-- Recommended configuration -->
<Stream bidirectional="true"
        contentType="audio/x-mulaw;rate=8000">
    wss://your-server.com/stream
</Stream>

Minimize Latency

For a responsive Voice AI experience, aim for under 1 second total response time:

Component	Target Latency
Speech-to-Text	< 200ms
LLM Processing	< 500ms
Text-to-Speech	< 200ms
Network (round trip)	< 100ms

Server Location: Deploy your WebSocket server close to your expected caller locations. Plivo routes calls through the edge location closest to the caller.

Traffic Source	Recommended Server Location
US-focused	US East (Virginia) or US West (Oregon)
Europe-focused	Frankfurt or London
Asia-Pacific	Singapore or Mumbai
Global	Deploy in multiple regions with geographic routing

Handle Interruptions

Always support user interruption using clearAudio:

// When user speaks while AI is playing
if (userSpeaking && aiPlaying) {
  ws.send(JSON.stringify({
    event: 'clearAudio',
    streamId: streamId
  }));
}

Integration Guides

For complete code examples and step-by-step tutorials:

Plivo Stream SDK

Official SDKs for Python, Node.js, and Java with full examples using Deepgram, OpenAI, and ElevenLabs

Pipecat

Build with the Pipecat framework for simplified voice AI pipelines

Next Steps

Protocol Reference: Complete JSON schemas, TypeScript types, and advanced patterns
Plivo Stream SDK: Production-ready SDKs with examples

Support

For questions, issues, or feature requests:

Documentation: https://www.plivo.com/docs/
Support: [email protected]
GitHub Issues: For SDK-specific issues

Last updated: January 2026

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

​Prerequisites

​1. Plivo Account

​2. Phone Number

​3. WebSocket Server

​4. AI Service Credentials (Optional)

​How It Works

​Architecture

​Step-by-Step Flow

​Stream XML

​Basic Syntax

​Parameters

​Supported Content Types

​Examples

​Bidirectional Stream with mu-law Codec

​Stream with Status Callbacks and Extra Headers

​Stream APIs

​Base URL

​Authentication

​Stop a Stream

​Get Stream Details

​Using the Plivo SDK

​Node.js

​Python

​Stream Status Callbacks

​Configuration

​Callback Parameters

​Example Handler

​Signature Validation

​V3 Signature Headers

​Using the Plivo SDK

​Using the Node.js Stream SDK

​WebSocket Events

​Events from Plivo (Input)

​Events to Plivo (Output)

​Quick Example

​X-Headers

​Usage

​Parsing

​Limits

​WebSocket and Stream Limits

​Best Practices

​Use mu-law 8000Hz

​Minimize Latency

​Handle Interruptions

​Integration Guides

Plivo Stream SDK

Pipecat

​Next Steps

​Support

Prerequisites

1. Plivo Account

2. Phone Number

3. WebSocket Server

4. AI Service Credentials (Optional)

How It Works

Architecture

Step-by-Step Flow

Stream XML

Basic Syntax

Parameters

Supported Content Types

Examples

Bidirectional Stream with mu-law Codec

Stream with Status Callbacks and Extra Headers

Stream APIs

Base URL

Authentication

Stop a Stream

Get Stream Details

Using the Plivo SDK

Node.js

Python

Stream Status Callbacks

Configuration

Callback Parameters

Example Handler

Signature Validation

V3 Signature Headers

Using the Plivo SDK

Using the Node.js Stream SDK

WebSocket Events

Events from Plivo (Input)

Events to Plivo (Output)

Quick Example

X-Headers

Usage

Parsing

Limits

WebSocket and Stream Limits

Best Practices

Use mu-law 8000Hz

Minimize Latency

Handle Interruptions

Integration Guides

Next Steps

Support