Skip to main content
Real-time bidirectional audio streaming enables Voice AI applications, live transcription, voice assistants, and custom audio processing on Plivo calls.

Prerequisites

1. Plivo Account

Sign up for Plivo and get your credentials:
CredentialWhere to Find
Auth IDPlivo Console
Auth TokenPlivo Console

2. Phone Number

You need a voice-enabled Plivo number to make or receive calls.
Call TypeNumber Requirement
InboundCallers dial your Plivo number, triggers your Answer URL, starts stream
OutboundYour Plivo number is the Caller ID when making calls via API
Get a number:
  1. Go to Phone Numbers > Buy Numbers
  2. Select country and type (local, toll-free, mobile)
  3. Filter by voice_enabled = true
  4. Purchase
Indian phone numbers require KYC compliance:
RequirementDetails
Account currencyMust be INR
KYC documentsCertificate of Incorporation (COI) + GST Certificate
Business registrationIndia-registered businesses only
Submit compliance at Compliance Application before purchasing. See Rent India Numbers for details.

3. WebSocket Server

Your server must:
  • Accept WebSocket connections over wss://
  • Be publicly accessible (use ngrok for local development)
  • Handle Plivo’s stream events (start, media, dtmf, stop)

4. AI Service Credentials (Optional)

For voice AI applications, you’ll typically need:
  • Speech-to-Text: Deepgram, Google Speech, AWS Transcribe
  • LLM: OpenAI, Anthropic, Google Gemini
  • Text-to-Speech: ElevenLabs, Google TTS, Amazon Polly

How It Works

Plivo streams real-time audio between phone calls and your WebSocket server.
Phone Call <-> Plivo <-> WebSocket <-> Your Server <-> AI Services

Architecture

Step-by-Step Flow

  1. Call Initiation: A caller dials your Plivo number, or your application initiates an outbound call.
  2. Answer URL Request: Plivo makes an HTTP request to your configured Answer URL.
  3. Stream XML Response: Your server responds with XML containing the <Stream> element, specifying the WebSocket URL and streaming parameters.
  4. WebSocket Connection: Plivo establishes a WebSocket connection to your specified URL.
  5. Start Event: Plivo sends a start event containing call metadata (call ID, stream ID, media format).
  6. Media Streaming:
    • Inbound: Plivo continuously sends media events containing base64-encoded audio chunks from the caller.
    • Outbound: Your server sends playAudio events with base64-encoded audio to be played to the caller.
  7. DTMF Events: When the caller presses keys, Plivo sends dtmf events with the digit information.
  8. Control Events: Your server can send clearAudio to interrupt playback or checkpoint to track playback progress.
  9. Connection Close: When the call ends or streaming stops, the WebSocket connection closes.

Stream XML

The <Stream> XML element initiates audio streaming for a call. Include it in your Answer URL response.

Basic Syntax

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">
        wss://your-server.com/stream
    </Stream>
</Response>

Parameters

ParameterTypeDefaultDescription
bidirectionalbooleanfalseEnable two-way audio streaming. When true, you can send audio back to the caller.
keepCallAlivebooleanfalseKeep the call active after the stream ends. When false, the call ends when streaming stops.
contentTypestringaudio/x-mulaw;rate=8000Audio codec and sample rate. See Supported Content Types.
statusCallbackUrlstringURL for stream status callbacks (started, stopped, failed).
statusCallbackMethodstringPOSTHTTP method for status callbacks (GET or POST).
extraHeadersstringCustom headers to include in the start event. Format: key1=value1;key2=value2

Supported Content Types

Content TypeDescriptionUse Case
audio/x-mulaw;rate=8000mu-law codec at 8kHzRecommended. Standard telephony, lowest latency, best compatibility.
audio/x-l16;rate=8000Linear PCM 16-bit at 8kHzHigher quality for speech processing.
audio/x-l16;rate=16000Linear PCM 16-bit at 16kHzHigh-quality speech recognition.

Examples

Bidirectional Stream with mu-law Codec

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Speak>Hello! I'm connecting you to our AI assistant.</Speak>
    <Stream bidirectional="true"
            keepCallAlive="true"
            contentType="audio/x-mulaw;rate=8000">
        wss://your-server.com/stream
    </Stream>
</Response>

Stream with Status Callbacks and Extra Headers

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Stream bidirectional="true"
            keepCallAlive="true"
            contentType="audio/x-mulaw;rate=8000"
            statusCallbackUrl="https://your-server.com/stream-status"
            statusCallbackMethod="POST"
            extraHeaders="userId=12345;sessionId=abc-xyz">
        wss://your-server.com/stream
    </Stream>
</Response>

Stream APIs

Control active streams programmatically via REST API calls.

Base URL

https://api.plivo.com/v1/Account/{auth_id}/Call/{call_uuid}/Stream/

Authentication

Use HTTP Basic Authentication with your Plivo Auth ID and Auth Token.

Stop a Stream

Endpoint: DELETE /v1/Account/{auth_id}/Call/{call_uuid}/Stream/
curl -X DELETE \
  https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
  -u YOUR_AUTH_ID:YOUR_AUTH_TOKEN

Get Stream Details

Endpoint: GET /v1/Account/{auth_id}/Call/{call_uuid}/Stream/
curl -X GET \
  https://api.plivo.com/v1/Account/YOUR_AUTH_ID/Call/CALL_UUID/Stream/ \
  -u YOUR_AUTH_ID:YOUR_AUTH_TOKEN

Using the Plivo SDK

Node.js

const plivo = require('plivo');
const client = new plivo.Client('YOUR_AUTH_ID', 'YOUR_AUTH_TOKEN');

// Stop a stream
await client.calls.stopStream('CALL_UUID');

Python

import plivo

client = plivo.RestClient('YOUR_AUTH_ID', 'YOUR_AUTH_TOKEN')

# Stop a stream
client.calls.stop_stream(call_uuid='CALL_UUID')

Stream Status Callbacks

Configure a callback URL to receive notifications about stream lifecycle events.

Configuration

<Stream bidirectional="true"
        statusCallbackUrl="https://your-server.com/stream-status"
        statusCallbackMethod="POST">
    wss://your-server.com/stream
</Stream>

Callback Parameters

ParameterTypeDescription
CallUUIDstringThe unique identifier for the call
StreamIDstringThe unique identifier for the stream
EventstringThe event type: started, stopped, failed
TimestampstringISO 8601 timestamp of the event
FromstringThe caller’s phone number
TostringThe called phone number
DirectionstringCall direction: inbound or outbound
StatusReasonstringReason for status (on stopped or failed)
DurationnumberStream duration in seconds (on stopped)

Example Handler

app.post('/stream-status', (req, res) => {
  const { CallUUID, StreamID, Event, StatusReason, Duration } = req.body;

  switch (Event) {
    case 'started':
      console.log(`Stream ${StreamID} started for call ${CallUUID}`);
      break;
    case 'stopped':
      console.log(`Stream ${StreamID} stopped after ${Duration}s: ${StatusReason}`);
      break;
    case 'failed':
      console.error(`Stream ${StreamID} failed: ${StatusReason}`);
      break;
  }

  res.sendStatus(200);
});

Signature Validation

Plivo signs WebSocket connection requests to verify authenticity. Validate these signatures to ensure requests originate from Plivo.

V3 Signature Headers

HeaderDescription
X-Plivo-Signature-V3The HMAC-SHA256 signature
X-Plivo-Signature-V3-NonceA unique nonce for this request

Using the Plivo SDK

import { validateV3Signature } from 'plivo';

const isValid = validateV3Signature(
  method,     // 'GET' for WebSocket upgrade requests
  uri,        // Full URI including protocol and path
  nonce,      // X-Plivo-Signature-V3-Nonce header value
  authToken,  // Your Plivo Auth Token
  signature,  // X-Plivo-Signature-V3 header value
);

Using the Node.js Stream SDK

The plivo-stream-sdk-node handles signature validation automatically:
const plivoServer = new PlivoWebSocketServer({
  server,
  path: '/stream',
  validateSignature: true,
  authToken: process.env.PLIVO_AUTH_TOKEN,
});
When validateSignature is enabled, connections with invalid signatures are automatically rejected with a 1008 WebSocket close code.

WebSocket Events

All communication over the WebSocket uses JSON messages. Here are the essential events you need to handle.

Events from Plivo (Input)

EventDescription
startSent once when stream begins. Contains call metadata (callId, streamId, mediaFormat).
mediaSent continuously. Contains base64-encoded audio chunks (~20ms each).
dtmfSent when caller presses keys. Contains the digit pressed.
playedStreamConfirmation that audio with a checkpoint finished playing.
clearedAudioConfirmation that the audio queue was cleared.

Events to Plivo (Output)

EventDescription
playAudioSend audio to the caller. Include base64 payload matching stream contentType.
checkpointMark a point in audio queue. Receive playedStream when reached.
clearAudioClear all queued audio. Use for interruption handling.

Quick Example

// Handle incoming events
ws.on('message', (data) => {
  const event = JSON.parse(data);

  switch (event.event) {
    case 'start':
      console.log('Stream started:', event.start.streamId);
      break;
    case 'media':
      // Forward audio to STT service
      const audio = Buffer.from(event.media.payload, 'base64');
      sttClient.send(audio);
      break;
    case 'dtmf':
      console.log('DTMF pressed:', event.dtmf.digit);
      break;
  }
});

// Send audio to caller
ws.send(JSON.stringify({
  event: 'playAudio',
  media: {
    contentType: 'audio/x-mulaw',
    sampleRate: 8000,
    payload: base64EncodedAudio
  }
}));
For complete event schemas, TypeScript types, and detailed field documentation, see the Audio Streaming Protocol Reference.

X-Headers

Pass custom metadata from your Stream XML to your WebSocket server.

Usage

<Stream bidirectional="true"
        extraHeaders="userId=12345;sessionId=abc-xyz;tier=premium">
    wss://your-server.com/stream
</Stream>

Parsing

function parseExtraHeaders(extraHeaders) {
  const headers = {};
  if (!extraHeaders) return headers;

  for (const pair of extraHeaders.split(';')) {
    const [key, value] = pair.split('=');
    if (key && value) {
      headers[key.trim()] = decodeURIComponent(value.trim());
    }
  }
  return headers;
}

// Usage
const headers = parseExtraHeaders(event.extra_headers);
console.log(headers.userId);     // "12345"
console.log(headers.sessionId);  // "abc-xyz"

Limits

WebSocket and Stream Limits

LimitValue
Maximum WebSocket URL length2048 characters
Maximum concurrent streams per call1
Maximum stream durationSame as call duration
Audio buffer size (playback queue)~60 seconds of audio
Maximum WebSocket message size64 KB
Recommended audio chunk size16 KB base64-encoded or less

Best Practices

Use mu-law 8000Hz

Why mu-law at 8kHz is recommended:
  1. Native Telephony Format: No transcoding required, lowest latency
  2. Bandwidth Efficient: Compresses 16-bit audio to 8-bit while maintaining voice quality
  3. Universal Compatibility: Every STT/TTS service supports mu-law
  4. Sufficient for Voice: Human speech is well-represented at 8kHz
<!-- Recommended configuration -->
<Stream bidirectional="true"
        contentType="audio/x-mulaw;rate=8000">
    wss://your-server.com/stream
</Stream>

Minimize Latency

For a responsive Voice AI experience, aim for under 1 second total response time:
ComponentTarget Latency
Speech-to-Text< 200ms
LLM Processing< 500ms
Text-to-Speech< 200ms
Network (round trip)< 100ms
Server Location: Deploy your WebSocket server close to your expected caller locations. Plivo routes calls through the edge location closest to the caller.
Traffic SourceRecommended Server Location
US-focusedUS East (Virginia) or US West (Oregon)
Europe-focusedFrankfurt or London
Asia-PacificSingapore or Mumbai
GlobalDeploy in multiple regions with geographic routing

Handle Interruptions

Always support user interruption using clearAudio:
// When user speaks while AI is playing
if (userSpeaking && aiPlaying) {
  ws.send(JSON.stringify({
    event: 'clearAudio',
    streamId: streamId
  }));
}

Integration Guides

For complete code examples and step-by-step tutorials:

Next Steps


Support

For questions, issues, or feature requests:
Last updated: January 2026