What You Can Build
- AI Voice Assistants - Natural conversations powered by speech-to-text, LLMs, and text-to-speech
- Real-time Transcription - Live call transcription with speech recognition services
- Voice Bots - Automated IVR systems with intelligent responses
- Call Analytics - Real-time audio analysis and sentiment detection
Get Started with Plivo
Before developing your AI voice agent, sign up for Plivo or sign in to your existing account. Purchase a voice-enabled number through the Plivo console.Prerequisites
Required Accounts
- Plivo - Account with Auth ID and Auth Token
- Deepgram - Sign up for speech-to-text
- OpenAI - Sign up for conversational AI
- ElevenLabs - Sign up for text-to-speech
Language Requirements
- Python
- Node.js
- Java
- Python 3.8 or later
- pip package manager
Installation
- Python
- Node.js
- Java
- FastAPI - For production applications using ASGI
- websockets - Lightweight option for simple use cases
Core Concepts
Audio Streaming Flow
- Caller dials your Plivo number
- Plivo connects to your WebSocket endpoint
- SDK receives START event with stream metadata
- Audio flows as MEDIA events (base64-encoded mu-law)
- Your app processes audio through AI services
- SDK sends audio back to the caller
Event Types
| Event | Description |
|---|---|
START | Stream initialized with call metadata (stream ID, call UUID, from/to numbers) |
MEDIA | Audio chunk received (base64-encoded, mu-law at 8kHz or linear PCM at 16kHz) |
DTMF | Caller pressed a key on their phone |
STOP | Stream ended |
Audio Formats
| Format | Encoding | Sample Rate | Use Case |
|---|---|---|---|
audio/x-mulaw | mu-law | 8000 Hz | Standard telephony (default) |
audio/x-l16 | Linear PCM | 16000 Hz | Higher quality for STT |
Quick Start
Step 1: Create a WebSocket Handler
- Python (FastAPI)
- Node.js
- Java
Step 2: Configure Plivo to Stream Audio
Create an XML application that routes calls to your WebSocket endpoint:Step 3: Set Up Local Development
For local testing, use ngrok to expose your WebSocket endpoint:For complete AI voice agent examples with Deepgram, OpenAI, and ElevenLabs integration, see Clone the Example Repositories or the Deepgram + OpenAI + ElevenLabs Guide.
SDK Reference
Sending Audio to Caller
- Python
- Node.js
- Java
Event Handlers
| Event | Handler | Description |
|---|---|---|
| Connection | on_connected / onConnection | WebSocket connected (before START) |
| Start | on_start / onStart | Stream initialized, call metadata available |
| Media | on_media / onMedia | Audio chunk received |
| DTMF | on_dtmf / onDtmf | Keypad digit pressed |
| Stop | on_stop / onStop | Stream ended |
| Checkpoint | on_played_stream / onPlayedStream | Checkpoint reached (audio finished) |
| Audio Cleared | on_cleared_audio / onClearedAudio | Audio queue cleared |
Getting Stream Information
- Python
- Node.js
- Java
Configuration Options
Environment Variables
Create a.env file with your credentials:
Plivo Stream XML Options
| Attribute | Description |
|---|---|
keepCallAlive | Keep call active after stream ends (true/false) |
audioTrack | Audio direction: inbound, outbound, or both |
contentType | Audio format: audio/x-mulaw;rate=8000 or audio/x-l16;rate=16000 |
statusCallbackUrl | URL for stream status webhooks |
Troubleshooting
WebSocket Connection Issues
- Verify ngrok is running and the URL matches your XML configuration
- Check firewall rules allow WebSocket connections on your server
- Validate SSL certificates if using custom domains
Audio Quality Issues
- Use correct audio format - mu-law at 8kHz for standard telephony
- Check sample rate matches between incoming and outgoing audio
- Monitor latency - keep processing under 200ms for natural conversation
No Audio Received
- Verify
audioTrackis set tobothorinboundin your XML - Check handler is registered before calling
start() - Confirm call is connected - START event should fire first
Clone the Example Repositories
Full working examples are available in the SDK repositories:- Python
- Node.js
- Java
Related
- Deepgram + OpenAI + ElevenLabs Guide - Complete AI voice agent tutorial with full code examples
- Audio Streaming API - API reference for audio streaming
- Stream XML Element - XML configuration reference
- OpenAI Realtime Integration - Alternative integration using OpenAI’s native voice API
Support
- Plivo Documentation
- Plivo Support for technical assistance
- GitHub Issues for SDK bugs