Overview
The Sarvam Conv AI SDK enables developers to create applications that can:- Build real-time voice-to-voice conversational experiences on mobile
- Create text-based chat applications using Sarvam agents
- Handle audio capture from microphone and playback to speakers automatically
- Manage conversation lifecycle with robust event handling
- Support multiple Indian languages for conversational AI
Installation
Add the SDK to yourpubspec.yaml:
Platform Setup
iOS
Add microphone permission toios/Runner/Info.plist:
Android
Add permission toandroid/app/src/main/AndroidManifest.xml:
Quick Start
Voice-to-Voice Conversation
Here’s a complete example for voice interaction:Text-Based Conversation
SamvaadAgent
The main class for managing conversational AI sessions. It automatically handles voice or text modes based on theinteractionType configuration.
Constructor Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| apiKey | String? | No* | API key for authentication. Optional when using proxy server. |
| config | InteractionConfig | Yes | Interaction configuration (see below) |
| audioInterface | AudioInterface? | No | Audio interface for mic/speaker (required for voice interactions) |
| audioCallback | AudioCallback? | No | Receives audio chunks (if not using audioInterface) |
| textCallback | TextCallback? | No | Receives streaming text chunks from the agent |
| eventCallback | EventCallback? | No | Receives events like user_interrupt, interaction_end |
| baseUrl | String | No | Base URL for API. Defaults to Sarvam API. Set to proxy URL to hide API keys. |
| headers | Map<String, String>? | No | Custom headers for HTTP requests (e.g., proxy authentication) |
Methods
Future<void> start()
Start the conversation session and establish WebSocket connection.
Future<void> stop()
Stop the conversation session and cleanup resources.
Future<bool> waitForConnect({Duration? timeout})
Wait until the WebSocket connection is established. Returns true if connected, false if timeout.
Future<void> waitForDisconnect()
Wait until the WebSocket disconnects or the agent is stopped.
bool get isConnected
Check if the WebSocket is currently connected.
String? get interactionId
Get the current interaction identifier.
String? get referenceId
Get the reference ID for this session (useful for telephony integrations).
Future<void> sendAudio(Uint8List audioData)
Send raw audio data (only for voice interactions). Audio must be 16-bit PCM mono at the configured sample rate.
Future<void> sendText(String text)
Send a text message (available for both voice and text interactions).
Future<void> sendVoiceNote(Uint8List audioData, {bool transcribe = true})
Send a complete voice note for transcription.
Configuration
InteractionConfig
The configuration object that defines the conversation parameters.Required Fields
| Field | Type | Description |
|---|---|---|
| userIdentifierType | UserIdentifierType | One of: .custom, .email, .phoneNumber, .unknown |
| userIdentifier | String | User identifier value (email, phone, or custom ID) |
| orgId | String | Your organization ID |
| workspaceId | String | Your workspace ID |
| appId | String | The target application ID |
| interactionType | InteractionType | InteractionType.call (voice) or InteractionType.chat (text) |
| sampleRate | int | Audio sample rate: 8000, 16000, or 48000 (16-bit PCM mono) |
Optional Fields
| Field | Type | Description |
|---|---|---|
| version | int? | App version number. If not provided, uses latest committed version |
| agentVariables | Map<String, dynamic>? | Key-value pairs to seed the agent context |
| initialLanguageName | SarvamToolLanguageName? | Starting language (e.g., .hindi, .english) |
| initialStateName | String? | Starting state name (if your app uses states) |
| initialBotMessage | String? | First message from the agent |
Important
Ifversionis not provided, the SDK uses the latest committed version of the app.
The connection will fail if the providedappIdhas no committed version.
Example Configuration
Audio Interfaces
DefaultAudioInterface
Handles microphone capture and speaker playback with platform-specific implementations.- Automatic microphone access and audio capture
- Real-time audio streaming at 8kHz, 16kHz, or 48kHz
- Automatic speaker playback of agent responses
- Handles user interruptions
- Platform-specific optimizations (iOS: flutter_sound, Android: native AudioTrack)
- iOS: Microphone usage description in Info.plist
- Android: RECORD_AUDIO permission in AndroidManifest.xml
Custom Audio Interface
Implement your own audio interface by implementing theAudioInterface abstract class:
Event Handling
The SDK provides callbacks for different types of events during the conversation.Text Callback
Receives streaming text chunks from the agent:Audio Callback
Receives raw audio chunks (if not using DefaultAudioInterface):Event Callback
Receives various conversation events:Supported Languages
The SDK supports multilingual conversations using theSarvamToolLanguageName enum:
SarvamToolLanguageName.bengali- BengaliSarvamToolLanguageName.gujarati- GujaratiSarvamToolLanguageName.kannada- KannadaSarvamToolLanguageName.malayalam- MalayalamSarvamToolLanguageName.tamil- TamilSarvamToolLanguageName.telugu- TeluguSarvamToolLanguageName.punjabi- PunjabiSarvamToolLanguageName.odia- OdiaSarvamToolLanguageName.marathi- MarathiSarvamToolLanguageName.hindi- HindiSarvamToolLanguageName.english- English
Proxy Server Setup
For production apps, you should hide your Sarvam API key from client devices by using a proxy server. This prevents API key exposure in your mobile app.Architecture
Flutter Client Setup (with Proxy)
Proxy Server Example (Python FastAPI)
Security Best Practices
1. Always Authenticate Users
2. Rate Limiting
3. Request Validation
4. Environment Variables
Store your API key securely:Message Types
Server Message Types
The SDK handles various message types from the server:| Event Type | Description |
|---|---|
server.media.text_chunk | Streaming text response from agent |
server.media.text | Complete text response from agent |
server.media.audio_chunk | Streaming audio response from agent |
server.action.interaction_connected | Conversation session established |
server.action.interaction_end | Conversation session ended |
server.event.user_speech_start | User started speaking |
server.event.user_speech_end | User stopped speaking |
server.event.user_interrupt | User interrupted the agent |
server.event.variable_update | Agent variables updated |
server.event.language_change | Conversation language changed |
server.event.state_transition | Agent state transitioned |
server.event.tool_call | Agent called a tool/function |
Client Message Types
Messages sent from the SDK to the server:| Message Type | Description |
|---|---|
client.action.interaction_start | Start conversation with configuration |
client.media.audio_chunk | Send audio data to agent |
client.media.text | Send text message to agent |
client.action.interaction_end | End conversation session |
Best Practices
1. Resource Cleanup
Always cleanup resources when done:2. Connection Timeout
Always specify a timeout when waiting for connection:3. Error Handling
Implement comprehensive error handling:4. State Management
Use proper state management for UI updates:5. Handle App Lifecycle
Stop the agent when the app goes to background:Troubleshooting
Common Issues
1. Microphone Permission Denied
2. Connection Timeout
- Check internet connectivity
- Verify API key is valid
- Ensure org_id, workspace_id, and app_id are correct
- Make sure the app has a committed version
3. No Audio Output
- Check device volume
- Verify audio interface is properly initialized
- Ensure sample rate matches server configuration
4. WebSocket Disconnects
- Implement reconnection logic
- Handle app lifecycle (pause/resume)
- Check for network changes