The Sarvam Conversational AI SDK for Flutter is a Dart library that helps developers build real-time voice-to-voice and text-based conversational AI applications. It provides a unified interface for managing conversation flow, handling audio streams, and processing real-time messages on both iOS and Android platforms.

Overview

The Sarvam Conv AI SDK enables developers to create applications that can:
  • Build real-time voice-to-voice conversational experiences on mobile
  • Create text-based chat applications using Sarvam agents
  • Handle audio capture from microphone and playback to speakers automatically
  • Manage conversation lifecycle with robust event handling
  • Support multiple Indian languages for conversational AI

Installation

Add the SDK to your pubspec.yaml:
dependencies:
  sarvamconv_ai_sdk: ^1.0.0
Then run:
flutter pub get

Platform Setup

iOS

Add microphone permission to ios/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access for voice conversations</string>

Android

Add permission to android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

Quick Start

Voice-to-Voice Conversation

Here’s a complete example for voice interaction:
import 'package:flutter/material.dart';
import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

class VoiceChat extends StatefulWidget {
  const VoiceChat({super.key});

  @override
  State<VoiceChat> createState() => _VoiceChatState();
}

class _VoiceChatState extends State<VoiceChat> {
  SamvaadAgent? _agent;
  DefaultAudioInterface? _audioInterface;
  bool _isConnected = false;
  String _transcript = '';

  Future<void> startConversation() async {
    try {
      // Create audio interface for mic/speaker handling
      _audioInterface = DefaultAudioInterface(
        inputSampleRate: 16000,
      );

      // Create configuration
      final config = InteractionConfig(
        orgId: 'your_org_id',
        workspaceId: 'your_workspace_id',
        appId: 'your_app_id',
        userIdentifier: 'user123',
        userIdentifierType: UserIdentifierType.custom,
        interactionType: InteractionType.call,
        sampleRate: 16000,
      );

      // Create agent
      _agent = SamvaadAgent(
        apiKey: 'your_api_key',
        config: config,
        audioInterface: _audioInterface,
        textCallback: (msg) async {
          if (msg is ServerTextChunkMsg) {
            setState(() => _transcript += msg.text);
          }
        },
        eventCallback: (event) async {
          if (event is ServerInteractionConnectedEvent) {
            setState(() => _isConnected = true);
          } else if (event is ServerInteractionEndEvent) {
            setState(() => _isConnected = false);
          }
        },
      );

      // Start the conversation
      await _agent!.start();

      // Wait for connection with timeout
      final connected = await _agent!.waitForConnect(
        timeout: const Duration(seconds: 10),
      );

      if (!connected) {
        throw Exception('Connection timeout');
      }
    } catch (e) {
      print('Error: $e');
    }
  }

  Future<void> stopConversation() async {
    await _agent?.stop();
    _agent = null;
    _audioInterface = null;
    setState(() => _isConnected = false);
  }

  @override
  void dispose() {
    stopConversation();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: const Text('Voice Chat')),
      body: Column(
        children: [
          ElevatedButton(
            onPressed: _isConnected ? stopConversation : startConversation,
            child: Text(_isConnected ? 'Stop Voice Chat' : 'Start Voice Chat'),
          ),
          Expanded(
            child: SingleChildScrollView(
              child: Text('Transcript: $_transcript'),
            ),
          ),
        ],
      ),
    );
  }
}

Text-Based Conversation

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

Future<void> main() async {
  final config = InteractionConfig(
    orgId: 'your_org_id',
    workspaceId: 'your_workspace_id',
    appId: 'your_app_id',
    userIdentifier: 'user@example.com',
    userIdentifierType: UserIdentifierType.email,
    interactionType: InteractionType.chat, // Text mode
    sampleRate: 16000, // Required but not used for text-only
  );

  final agent = SamvaadAgent(
    apiKey: 'your_api_key',
    config: config,
    textCallback: (msg) async {
      if (msg is ServerTextMsg) {
        print('Agent: ${msg.text}');
      } else if (msg is ServerTextChunkMsg) {
        print('Agent (chunk): ${msg.text}');
      }
    },
    eventCallback: (event) async {
      print('Event: ${event.type}');
    },
  );

  // Start the conversation
  await agent.start();

  // Wait for connection
  final connected = await agent.waitForConnect(
    timeout: const Duration(seconds: 10),
  );

  if (!connected) {
    print('Failed to connect');
    return;
  }

  // Send a text message
  await agent.sendText('Hello, how are you?');

  // Wait for conversation to complete
  await agent.waitForDisconnect();
}

SamvaadAgent

The main class for managing conversational AI sessions. It automatically handles voice or text modes based on the interactionType configuration.

Constructor Parameters

ParameterTypeRequiredDescription
apiKeyString?No*API key for authentication. Optional when using proxy server.
configInteractionConfigYesInteraction configuration (see below)
audioInterfaceAudioInterface?NoAudio interface for mic/speaker (required for voice interactions)
audioCallbackAudioCallback?NoReceives audio chunks (if not using audioInterface)
textCallbackTextCallback?NoReceives streaming text chunks from the agent
eventCallbackEventCallback?NoReceives events like user_interrupt, interaction_end
baseUrlStringNoBase URL for API. Defaults to Sarvam API. Set to proxy URL to hide API keys.
headersMap<String, String>?NoCustom headers for HTTP requests (e.g., proxy authentication)
*Required when not using a proxy server

Methods

Future<void> start()

Start the conversation session and establish WebSocket connection.
await agent.start();

Future<void> stop()

Stop the conversation session and cleanup resources.
await agent.stop();

Future<bool> waitForConnect({Duration? timeout})

Wait until the WebSocket connection is established. Returns true if connected, false if timeout.
final connected = await agent.waitForConnect(
  timeout: const Duration(seconds: 10),
);
if (!connected) {
  print('Connection timeout');
}

Future<void> waitForDisconnect()

Wait until the WebSocket disconnects or the agent is stopped.
await agent.waitForDisconnect();

bool get isConnected

Check if the WebSocket is currently connected.
if (agent.isConnected) {
  print('Agent is connected');
}

String? get interactionId

Get the current interaction identifier.
final id = agent.interactionId;
print('Interaction ID: $id');

String? get referenceId

Get the reference ID for this session (useful for telephony integrations).
print('Reference ID: ${agent.referenceId}');

Future<void> sendAudio(Uint8List audioData)

Send raw audio data (only for voice interactions). Audio must be 16-bit PCM mono at the configured sample rate.
// Only for voice mode
await agent.sendAudio(audioBytes);

Future<void> sendText(String text)

Send a text message (available for both voice and text interactions).
await agent.sendText('Hello, how can you help me?');

Future<void> sendVoiceNote(Uint8List audioData, {bool transcribe = true})

Send a complete voice note for transcription.
await agent.sendVoiceNote(audioBytes, transcribe: true);

Configuration

InteractionConfig

The configuration object that defines the conversation parameters.

Required Fields

FieldTypeDescription
userIdentifierTypeUserIdentifierTypeOne of: .custom, .email, .phoneNumber, .unknown
userIdentifierStringUser identifier value (email, phone, or custom ID)
orgIdStringYour organization ID
workspaceIdStringYour workspace ID
appIdStringThe target application ID
interactionTypeInteractionTypeInteractionType.call (voice) or InteractionType.chat (text)
sampleRateintAudio sample rate: 8000, 16000, or 48000 (16-bit PCM mono)

Optional Fields

FieldTypeDescription
versionint?App version number. If not provided, uses latest committed version
agentVariablesMap<String, dynamic>?Key-value pairs to seed the agent context
initialLanguageNameSarvamToolLanguageName?Starting language (e.g., .hindi, .english)
initialStateNameString?Starting state name (if your app uses states)
initialBotMessageString?First message from the agent
Important
If version is not provided, the SDK uses the latest committed version of the app.
The connection will fail if the provided appId has no committed version.

Example Configuration

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final config = InteractionConfig(
  orgId: 'sarvamai',
  workspaceId: 'default',
  appId: 'your_app_id',
  userIdentifier: 'user123',
  userIdentifierType: UserIdentifierType.custom,
  interactionType: InteractionType.call,
  sampleRate: 16000,
  agentVariables: {
    'user_language': 'Hindi',
    'context': 'customer_support',
  },
  initialLanguageName: SarvamToolLanguageName.hindi,
  initialStateName: 'greeting',
  initialBotMessage: 'Hello! How can I help you today?',
  version: 1,
);

Audio Interfaces

DefaultAudioInterface

Handles microphone capture and speaker playback with platform-specific implementations.
import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final audioInterface = DefaultAudioInterface(
  inputSampleRate: 16000,
  outputSampleRate: 16000,
);
Features:
  • Automatic microphone access and audio capture
  • Real-time audio streaming at 8kHz, 16kHz, or 48kHz
  • Automatic speaker playback of agent responses
  • Handles user interruptions
  • Platform-specific optimizations (iOS: flutter_sound, Android: native AudioTrack)
Audio Format: LINEAR16 (16-bit PCM mono) Platform Requirements:
  • iOS: Microphone usage description in Info.plist
  • Android: RECORD_AUDIO permission in AndroidManifest.xml

Custom Audio Interface

Implement your own audio interface by implementing the AudioInterface abstract class:
abstract class AudioInterface {
  /// Start audio capture. Call inputCallback with audio chunks.
  Future<void> start(AudioInputCallback inputCallback);
  
  /// Output audio to the speaker.
  Future<void> output(Uint8List audio, {int? sampleRate});
  
  /// Interrupt any ongoing audio playback.
  void interrupt();
  
  /// Stop and cleanup resources.
  Future<void> stop();
}
Example custom implementation:
class CustomAudioInterface implements AudioInterface {
  @override
  Future<void> start(AudioInputCallback inputCallback) async {
    // Start capturing audio and call inputCallback with chunks
  }

  @override
  Future<void> output(Uint8List audio, {int? sampleRate}) async {
    // Play audio through speaker
  }

  @override
  void interrupt() {
    // Stop any queued audio playback
  }

  @override
  Future<void> stop() async {
    // Cleanup resources
  }
}

Event Handling

The SDK provides callbacks for different types of events during the conversation.

Text Callback

Receives streaming text chunks from the agent:
textCallback: (msg) async {
  if (msg is ServerTextMsg) {
    // Complete text message
    print('Agent says: ${msg.text}');
  } else if (msg is ServerTextChunkMsg) {
    // Streaming chunk
    print('Agent (streaming): ${msg.text}');
  }
}

Audio Callback

Receives raw audio chunks (if not using DefaultAudioInterface):
audioCallback: (msg) async {
  // Handle raw audio data
  final audioData = msg.audioBytes; // Uint8List
  final sampleRate = msg.sampleRate; // int?
  // Process or play the audio
}

Event Callback

Receives various conversation events:
eventCallback: (event) async {
  if (event is ServerInteractionConnectedEvent) {
    print('Connected! Interaction ID: ${event.interactionId}');
  } else if (event is ServerInteractionEndEvent) {
    print('Conversation ended');
  } else if (event is ServerUserInterruptEvent) {
    print('User interrupted the agent');
  } else if (event is ServerUserSpeechStartEvent) {
    print('User started speaking');
  } else if (event is ServerUserSpeechEndEvent) {
    print('User stopped speaking');
  } else if (event is ServerLanguageChangeEvent) {
    print('Language changed to: ${event.languageName}');
  } else if (event is ServerStateTransitionEvent) {
    print('State changed to: ${event.stateName}');
  } else if (event is ServerVariableUpdateEvent) {
    print('Variables updated: ${event.variables}');
  }
}

Supported Languages

The SDK supports multilingual conversations using the SarvamToolLanguageName enum:
import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';
Available languages:
  • SarvamToolLanguageName.bengali - Bengali
  • SarvamToolLanguageName.gujarati - Gujarati
  • SarvamToolLanguageName.kannada - Kannada
  • SarvamToolLanguageName.malayalam - Malayalam
  • SarvamToolLanguageName.tamil - Tamil
  • SarvamToolLanguageName.telugu - Telugu
  • SarvamToolLanguageName.punjabi - Punjabi
  • SarvamToolLanguageName.odia - Odia
  • SarvamToolLanguageName.marathi - Marathi
  • SarvamToolLanguageName.hindi - Hindi
  • SarvamToolLanguageName.english - English
Note: The allowed languages are a subset that is preselected while defining the agent configuration on the platform. Example usage:
final config = InteractionConfig(
  // ... other config
  initialLanguageName: SarvamToolLanguageName.hindi,
);

Proxy Server Setup

For production apps, you should hide your Sarvam API key from client devices by using a proxy server. This prevents API key exposure in your mobile app.

Architecture

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│   Flutter App   │ ──────▶ │  Proxy Server   │ ──────▶ │   Sarvam API    │
│  (No API Key)   │         │ (Adds API Key)  │         │                 │
└─────────────────┘         └─────────────────┘         └─────────────────┘

Flutter Client Setup (with Proxy)

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final config = InteractionConfig(
  orgId: 'your_org_id',
  workspaceId: 'your_workspace_id',
  appId: 'your_app_id',
  userIdentifier: 'user@example.com',
  userIdentifierType: UserIdentifierType.email,
  interactionType: InteractionType.call,
  sampleRate: 16000,
);

// No API key passed - proxy server will add it
final agent = SamvaadAgent(
  config: config,
  baseUrl: 'https://your-proxy-server.com/sarvam-proxy/',
  headers: {
    'Authorization': 'Bearer user_session_token', // Your app's auth
  },
  audioInterface: DefaultAudioInterface(),
);

await agent.start();

Proxy Server Example (Python FastAPI)

from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse
import httpx
import os

app = FastAPI()

SARVAM_API_KEY = os.environ.get("SARVAM_API_KEY")
SARVAM_BASE_URL = "https://apps.sarvam.ai/api/app-runtime"

@app.api_route("/sarvam-proxy/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def proxy(request: Request, path: str):
    # Optional: Verify user authentication
    auth_header = request.headers.get("Authorization")
    if not auth_header:
        raise HTTPException(status_code=401, detail="Unauthorized")
    
    # Build target URL
    target_url = f"{SARVAM_BASE_URL}/{path}"
    if request.query_params:
        target_url += f"?{request.query_params}"
    
    # Forward request with API key
    async with httpx.AsyncClient() as client:
        headers = dict(request.headers)
        headers["X-API-Key"] = SARVAM_API_KEY
        headers.pop("host", None)
        
        response = await client.request(
            method=request.method,
            url=target_url,
            headers=headers,
            content=await request.body(),
        )
        
        return StreamingResponse(
            content=response.iter_bytes(),
            status_code=response.status_code,
            headers=dict(response.headers),
        )

Security Best Practices

1. Always Authenticate Users

app.use('/sarvam-proxy', async (req, res, next) => {
  const token = req.headers['authorization']?.replace('Bearer ', '');
  
  if (!token) {
    return res.status(401).json({ error: 'No token provided' });
  }
  
  const user = await verifyUserToken(token);
  if (!user) {
    return res.status(401).json({ error: 'Invalid token' });
  }
  
  req.user = user;
  next();
});

2. Rate Limiting

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 60, // 60 requests per minute per IP
  message: { error: 'Too many requests' },
});

app.use('/sarvam-proxy', limiter);

3. Request Validation

app.use('/sarvam-proxy', (req, res, next) => {
  const allowedPaths = ['/orgs/', '/workspaces/', '/apps/'];
  const isAllowed = allowedPaths.some(p => req.path.includes(p));
  
  if (!isAllowed) {
    return res.status(403).json({ error: 'Path not allowed' });
  }
  
  next();
});

4. Environment Variables

Store your API key securely:
# .env (never commit this file!)
SARVAM_API_KEY=sk_samvaad_your_actual_api_key

Message Types

Server Message Types

The SDK handles various message types from the server:
Event TypeDescription
server.media.text_chunkStreaming text response from agent
server.media.textComplete text response from agent
server.media.audio_chunkStreaming audio response from agent
server.action.interaction_connectedConversation session established
server.action.interaction_endConversation session ended
server.event.user_speech_startUser started speaking
server.event.user_speech_endUser stopped speaking
server.event.user_interruptUser interrupted the agent
server.event.variable_updateAgent variables updated
server.event.language_changeConversation language changed
server.event.state_transitionAgent state transitioned
server.event.tool_callAgent called a tool/function

Client Message Types

Messages sent from the SDK to the server:
Message TypeDescription
client.action.interaction_startStart conversation with configuration
client.media.audio_chunkSend audio data to agent
client.media.textSend text message to agent
client.action.interaction_endEnd conversation session

Best Practices

1. Resource Cleanup

Always cleanup resources when done:
class _MyWidgetState extends State<MyWidget> {
  SamvaadAgent? _agent;

  @override
  void dispose() {
    _agent?.stop();
    super.dispose();
  }
}

2. Connection Timeout

Always specify a timeout when waiting for connection:
final connected = await agent.waitForConnect(
  timeout: const Duration(seconds: 10),
);
if (!connected) {
  // Handle connection failure
}

3. Error Handling

Implement comprehensive error handling:
try {
  await agent.start();
} catch (e) {
  if (e.toString().contains('permission')) {
    // Handle microphone permission error
  } else if (e.toString().contains('network')) {
    // Handle network error
  } else {
    // Handle other errors
  }
}

4. State Management

Use proper state management for UI updates:
// Using setState for simple cases
eventCallback: (event) async {
  if (event is ServerInteractionConnectedEvent) {
    setState(() => _isConnected = true);
  }
}

// Or use a state management solution like Provider/Riverpod/Bloc

5. Handle App Lifecycle

Stop the agent when the app goes to background:
class _MyWidgetState extends State<MyWidget> with WidgetsBindingObserver {
  @override
  void initState() {
    super.initState();
    WidgetsBinding.instance.addObserver(this);
  }

  @override
  void dispose() {
    WidgetsBinding.instance.removeObserver(this);
    super.dispose();
  }

  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    if (state == AppLifecycleState.paused) {
      _agent?.stop();
    }
  }
}

Troubleshooting

Common Issues

1. Microphone Permission Denied

// Check and request permission before starting
import 'package:permission_handler/permission_handler.dart';

final status = await Permission.microphone.request();
if (!status.isGranted) {
  // Show dialog asking user to enable permission
}

2. Connection Timeout

  • Check internet connectivity
  • Verify API key is valid
  • Ensure org_id, workspace_id, and app_id are correct
  • Make sure the app has a committed version

3. No Audio Output

  • Check device volume
  • Verify audio interface is properly initialized
  • Ensure sample rate matches server configuration

4. WebSocket Disconnects

  • Implement reconnection logic
  • Handle app lifecycle (pause/resume)
  • Check for network changes
// Simple reconnection logic
eventCallback: (event) async {
  if (event is ServerInteractionEndEvent) {
    // Optionally reconnect
    await Future.delayed(const Duration(seconds: 2));
    await _startCall();
  }
}