Flutter SDK - Agent Docs

The Sarvam Conversational AI SDK for Flutter is a Dart library that helps developers build real-time voice-to-voice and text-based conversational AI applications. It provides a unified interface for managing conversation flow, handling audio streams, and processing real-time messages on both iOS and Android platforms.

Overview

The Sarvam Conv AI SDK enables developers to create applications that can:

Build real-time voice-to-voice conversational experiences on mobile
Create text-based chat applications using Sarvam agents
Handle audio capture from microphone and playback to speakers automatically
Manage conversation lifecycle with robust event handling
Support multiple Indian languages for conversational AI

Installation

Add the SDK to your pubspec.yaml:

dependencies:
  sarvamconv_ai_sdk: ^1.0.0

Then run:

flutter pub get

Platform Setup

iOS

Add microphone permission to ios/Runner/Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access for voice conversations</string>

Android

Add permission to android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

Quick Start

Voice-to-Voice Conversation

Here’s a complete example for voice interaction:

import 'package:flutter/material.dart';
import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

class VoiceChat extends StatefulWidget {
  const VoiceChat({super.key});

  @override
  State<VoiceChat> createState() => _VoiceChatState();
}

class _VoiceChatState extends State<VoiceChat> {
  SamvaadAgent? _agent;
  DefaultAudioInterface? _audioInterface;
  bool _isConnected = false;
  String _transcript = '';

  Future<void> startConversation() async {
    try {
      // Create audio interface for mic/speaker handling
      _audioInterface = DefaultAudioInterface(
        inputSampleRate: 16000,
      );

      // Create configuration
      final config = InteractionConfig(
        orgId: 'your_org_id',
        workspaceId: 'your_workspace_id',
        appId: 'your_app_id',
        userIdentifier: 'user123',
        userIdentifierType: UserIdentifierType.custom,
        interactionType: InteractionType.call,
        sampleRate: 16000,
      );

      // Create agent
      _agent = SamvaadAgent(
        apiKey: 'your_api_key',
        config: config,
        audioInterface: _audioInterface,
        textCallback: (msg) async {
          if (msg is ServerTextChunkMsg) {
            setState(() => _transcript += msg.text);
          }
        },
        eventCallback: (event) async {
          if (event is ServerInteractionConnectedEvent) {
            setState(() => _isConnected = true);
          } else if (event is ServerInteractionEndEvent) {
            setState(() => _isConnected = false);
          }
        },
      );

      // Start the conversation
      await _agent!.start();

      // Wait for connection with timeout
      final connected = await _agent!.waitForConnect(
        timeout: const Duration(seconds: 10),
      );

      if (!connected) {
        throw Exception('Connection timeout');
      }
    } catch (e) {
      print('Error: $e');
    }
  }

  Future<void> stopConversation() async {
    await _agent?.stop();
    _agent = null;
    _audioInterface = null;
    setState(() => _isConnected = false);
  }

  @override
  void dispose() {
    stopConversation();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: const Text('Voice Chat')),
      body: Column(
        children: [
          ElevatedButton(
            onPressed: _isConnected ? stopConversation : startConversation,
            child: Text(_isConnected ? 'Stop Voice Chat' : 'Start Voice Chat'),
          ),
          Expanded(
            child: SingleChildScrollView(
              child: Text('Transcript: $_transcript'),
            ),
          ),
        ],
      ),
    );
  }
}

Text-Based Conversation

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

Future<void> main() async {
  final config = InteractionConfig(
    orgId: 'your_org_id',
    workspaceId: 'your_workspace_id',
    appId: 'your_app_id',
    userIdentifier: 'user@example.com',
    userIdentifierType: UserIdentifierType.email,
    interactionType: InteractionType.chat, // Text mode
    sampleRate: 16000, // Required but not used for text-only
  );

  final agent = SamvaadAgent(
    apiKey: 'your_api_key',
    config: config,
    textCallback: (msg) async {
      if (msg is ServerTextMsg) {
        print('Agent: ${msg.text}');
      } else if (msg is ServerTextChunkMsg) {
        print('Agent (chunk): ${msg.text}');
      }
    },
    eventCallback: (event) async {
      print('Event: ${event.type}');
    },
  );

  // Start the conversation
  await agent.start();

  // Wait for connection
  final connected = await agent.waitForConnect(
    timeout: const Duration(seconds: 10),
  );

  if (!connected) {
    print('Failed to connect');
    return;
  }

  // Send a text message
  await agent.sendText('Hello, how are you?');

  // Wait for conversation to complete
  await agent.waitForDisconnect();
}

SamvaadAgent

The main class for managing conversational AI sessions. It automatically handles voice or text modes based on the interactionType configuration.

Constructor Parameters

Parameter	Type	Required	Description
apiKey	String?	No*	API key for authentication. Optional when using proxy server.
config	InteractionConfig	Yes	Interaction configuration (see below)
audioInterface	AudioInterface?	No	Audio interface for mic/speaker (required for voice interactions)
audioCallback	AudioCallback?	No	Receives audio chunks (if not using audioInterface)
textCallback	TextCallback?	No	Receives streaming text chunks from the agent
eventCallback	EventCallback?	No	Receives events like user_interrupt, interaction_end
baseUrl	String	No	Base URL for API. Defaults to Sarvam API. Set to proxy URL to hide API keys.
headers	Map<String, String>?	No	Custom headers for HTTP requests (e.g., proxy authentication)

*Required when not using a proxy server

Methods

`Future<void> start()`

Start the conversation session and establish WebSocket connection.

await agent.start();

`Future<void> stop()`

Stop the conversation session and cleanup resources.

await agent.stop();

`Future<bool> waitForConnect({Duration? timeout})`

Wait until the WebSocket connection is established. Returns true if connected, false if timeout.

final connected = await agent.waitForConnect(
  timeout: const Duration(seconds: 10),
);
if (!connected) {
  print('Connection timeout');
}

`Future<void> waitForDisconnect()`

Wait until the WebSocket disconnects or the agent is stopped.

await agent.waitForDisconnect();

`bool get isConnected`

Check if the WebSocket is currently connected.

if (agent.isConnected) {
  print('Agent is connected');
}

`String? get interactionId`

Get the current interaction identifier.

final id = agent.interactionId;
print('Interaction ID: $id');

`String? get referenceId`

Get the reference ID for this session (useful for telephony integrations).

print('Reference ID: ${agent.referenceId}');

`Future<void> sendAudio(Uint8List audioData)`

Send raw audio data (only for voice interactions). Audio must be 16-bit PCM mono at the configured sample rate.

// Only for voice mode
await agent.sendAudio(audioBytes);

`Future<void> sendText(String text)`

Send a text message (available for both voice and text interactions).

await agent.sendText('Hello, how can you help me?');

`Future<void> sendVoiceNote(Uint8List audioData, {bool transcribe = true})`

Send a complete voice note for transcription.

await agent.sendVoiceNote(audioBytes, transcribe: true);

Configuration

InteractionConfig

The configuration object that defines the conversation parameters.

Required Fields

Field	Type	Description
userIdentifierType	UserIdentifierType	One of: `.custom`, `.email`, `.phoneNumber`, `.unknown`
userIdentifier	String	User identifier value (email, phone, or custom ID)
orgId	String	Your organization ID
workspaceId	String	Your workspace ID
appId	String	The target application ID
interactionType	InteractionType	`InteractionType.call` (voice) or `InteractionType.chat` (text)
sampleRate	int	Audio sample rate: 8000, 16000, or 48000 (16-bit PCM mono)

Optional Fields

Field	Type	Description
version	int?	App version number. If not provided, uses latest committed version
agentVariables	Map<String, dynamic>?	Key-value pairs to seed the agent context
initialLanguageName	SarvamToolLanguageName?	Starting language (e.g., `.hindi`, `.english`)
initialStateName	String?	Starting state name (if your app uses states)
initialBotMessage	String?	First message from the agent

Important
If version is not provided, the SDK uses the latest committed version of the app.
The connection will fail if the provided appId has no committed version.

Example Configuration

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final config = InteractionConfig(
  orgId: 'sarvamai',
  workspaceId: 'default',
  appId: 'your_app_id',
  userIdentifier: 'user123',
  userIdentifierType: UserIdentifierType.custom,
  interactionType: InteractionType.call,
  sampleRate: 16000,
  agentVariables: {
    'user_language': 'Hindi',
    'context': 'customer_support',
  },
  initialLanguageName: SarvamToolLanguageName.hindi,
  initialStateName: 'greeting',
  initialBotMessage: 'Hello! How can I help you today?',
  version: 1,
);

Audio Interfaces

DefaultAudioInterface

Handles microphone capture and speaker playback with platform-specific implementations.

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final audioInterface = DefaultAudioInterface(
  inputSampleRate: 16000,
  outputSampleRate: 16000,
);

Features:

Automatic microphone access and audio capture
Real-time audio streaming at 8kHz, 16kHz, or 48kHz
Automatic speaker playback of agent responses
Handles user interruptions
Platform-specific optimizations (iOS: flutter_sound, Android: native AudioTrack)

Audio Format: LINEAR16 (16-bit PCM mono) Platform Requirements:

iOS: Microphone usage description in Info.plist
Android: RECORD_AUDIO permission in AndroidManifest.xml

Custom Audio Interface

Implement your own audio interface by implementing the AudioInterface abstract class:

abstract class AudioInterface {
  /// Start audio capture. Call inputCallback with audio chunks.
  Future<void> start(AudioInputCallback inputCallback);
  
  /// Output audio to the speaker.
  Future<void> output(Uint8List audio, {int? sampleRate});
  
  /// Interrupt any ongoing audio playback.
  void interrupt();
  
  /// Stop and cleanup resources.
  Future<void> stop();
}

Example custom implementation:

class CustomAudioInterface implements AudioInterface {
  @override
  Future<void> start(AudioInputCallback inputCallback) async {
    // Start capturing audio and call inputCallback with chunks
  }

  @override
  Future<void> output(Uint8List audio, {int? sampleRate}) async {
    // Play audio through speaker
  }

  @override
  void interrupt() {
    // Stop any queued audio playback
  }

  @override
  Future<void> stop() async {
    // Cleanup resources
  }
}

Event Handling

The SDK provides callbacks for different types of events during the conversation.

Text Callback

Receives streaming text chunks from the agent:

textCallback: (msg) async {
  if (msg is ServerTextMsg) {
    // Complete text message
    print('Agent says: ${msg.text}');
  } else if (msg is ServerTextChunkMsg) {
    // Streaming chunk
    print('Agent (streaming): ${msg.text}');
  }
}

Audio Callback

Receives raw audio chunks (if not using DefaultAudioInterface):

audioCallback: (msg) async {
  // Handle raw audio data
  final audioData = msg.audioBytes; // Uint8List
  final sampleRate = msg.sampleRate; // int?
  // Process or play the audio
}

Event Callback

Receives various conversation events:

eventCallback: (event) async {
  if (event is ServerInteractionConnectedEvent) {
    print('Connected! Interaction ID: ${event.interactionId}');
  } else if (event is ServerInteractionEndEvent) {
    print('Conversation ended');
  } else if (event is ServerUserInterruptEvent) {
    print('User interrupted the agent');
  } else if (event is ServerUserSpeechStartEvent) {
    print('User started speaking');
  } else if (event is ServerUserSpeechEndEvent) {
    print('User stopped speaking');
  } else if (event is ServerLanguageChangeEvent) {
    print('Language changed to: ${event.languageName}');
  } else if (event is ServerStateTransitionEvent) {
    print('State changed to: ${event.stateName}');
  } else if (event is ServerVariableUpdateEvent) {
    print('Variables updated: ${event.variables}');
  }
}

Supported Languages

The SDK supports multilingual conversations using the SarvamToolLanguageName enum:

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

Available languages:

SarvamToolLanguageName.bengali - Bengali
SarvamToolLanguageName.gujarati - Gujarati
SarvamToolLanguageName.kannada - Kannada
SarvamToolLanguageName.malayalam - Malayalam
SarvamToolLanguageName.tamil - Tamil
SarvamToolLanguageName.telugu - Telugu
SarvamToolLanguageName.punjabi - Punjabi
SarvamToolLanguageName.odia - Odia
SarvamToolLanguageName.marathi - Marathi
SarvamToolLanguageName.hindi - Hindi
SarvamToolLanguageName.english - English

Note: The allowed languages are a subset that is preselected while defining the agent configuration on the platform. Example usage:

final config = InteractionConfig(
  // ... other config
  initialLanguageName: SarvamToolLanguageName.hindi,
);

Proxy Server Setup

For production apps, you should hide your Sarvam API key from client devices by using a proxy server. This prevents API key exposure in your mobile app.

Architecture

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│   Flutter App   │ ──────▶ │  Proxy Server   │ ──────▶ │   Sarvam API    │
│  (No API Key)   │         │ (Adds API Key)  │         │                 │
└─────────────────┘         └─────────────────┘         └─────────────────┘

Flutter Client Setup (with Proxy)

import 'package:sarvamconv_ai_sdk/sarvamconv_ai_sdk.dart';

final config = InteractionConfig(
  orgId: 'your_org_id',
  workspaceId: 'your_workspace_id',
  appId: 'your_app_id',
  userIdentifier: 'user@example.com',
  userIdentifierType: UserIdentifierType.email,
  interactionType: InteractionType.call,
  sampleRate: 16000,
);

// No API key passed - proxy server will add it
final agent = SamvaadAgent(
  config: config,
  baseUrl: 'https://your-proxy-server.com/sarvam-proxy/',
  headers: {
    'Authorization': 'Bearer user_session_token', // Your app's auth
  },
  audioInterface: DefaultAudioInterface(),
);

await agent.start();

Proxy Server Example (Python FastAPI)

from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse
import httpx
import os

app = FastAPI()

SARVAM_API_KEY = os.environ.get("SARVAM_API_KEY")
SARVAM_BASE_URL = "https://apps.sarvam.ai/api/app-runtime"

@app.api_route("/sarvam-proxy/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def proxy(request: Request, path: str):
    # Optional: Verify user authentication
    auth_header = request.headers.get("Authorization")
    if not auth_header:
        raise HTTPException(status_code=401, detail="Unauthorized")
    
    # Build target URL
    target_url = f"{SARVAM_BASE_URL}/{path}"
    if request.query_params:
        target_url += f"?{request.query_params}"
    
    # Forward request with API key
    async with httpx.AsyncClient() as client:
        headers = dict(request.headers)
        headers["X-API-Key"] = SARVAM_API_KEY
        headers.pop("host", None)
        
        response = await client.request(
            method=request.method,
            url=target_url,
            headers=headers,
            content=await request.body(),
        )
        
        return StreamingResponse(
            content=response.iter_bytes(),
            status_code=response.status_code,
            headers=dict(response.headers),
        )

Security Best Practices

1. Always Authenticate Users

app.use('/sarvam-proxy', async (req, res, next) => {
  const token = req.headers['authorization']?.replace('Bearer ', '');
  
  if (!token) {
    return res.status(401).json({ error: 'No token provided' });
  }
  
  const user = await verifyUserToken(token);
  if (!user) {
    return res.status(401).json({ error: 'Invalid token' });
  }
  
  req.user = user;
  next();
});

2. Rate Limiting

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 60, // 60 requests per minute per IP
  message: { error: 'Too many requests' },
});

app.use('/sarvam-proxy', limiter);

3. Request Validation

app.use('/sarvam-proxy', (req, res, next) => {
  const allowedPaths = ['/orgs/', '/workspaces/', '/apps/'];
  const isAllowed = allowedPaths.some(p => req.path.includes(p));
  
  if (!isAllowed) {
    return res.status(403).json({ error: 'Path not allowed' });
  }
  
  next();
});

4. Environment Variables

Store your API key securely:

# .env (never commit this file!)
SARVAM_API_KEY=sk_samvaad_your_actual_api_key

Message Types

Server Message Types

The SDK handles various message types from the server:

Event Type	Description
`server.media.text_chunk`	Streaming text response from agent
`server.media.text`	Complete text response from agent
`server.media.audio_chunk`	Streaming audio response from agent
`server.action.interaction_connected`	Conversation session established
`server.action.interaction_end`	Conversation session ended
`server.event.user_speech_start`	User started speaking
`server.event.user_speech_end`	User stopped speaking
`server.event.user_interrupt`	User interrupted the agent
`server.event.variable_update`	Agent variables updated
`server.event.language_change`	Conversation language changed
`server.event.state_transition`	Agent state transitioned
`server.event.tool_call`	Agent called a tool/function

Client Message Types

Messages sent from the SDK to the server:

Message Type	Description
`client.action.interaction_start`	Start conversation with configuration
`client.media.audio_chunk`	Send audio data to agent
`client.media.text`	Send text message to agent
`client.action.interaction_end`	End conversation session

Best Practices

1. Resource Cleanup

Always cleanup resources when done:

class _MyWidgetState extends State<MyWidget> {
  SamvaadAgent? _agent;

  @override
  void dispose() {
    _agent?.stop();
    super.dispose();
  }
}

2. Connection Timeout

Always specify a timeout when waiting for connection:

final connected = await agent.waitForConnect(
  timeout: const Duration(seconds: 10),
);
if (!connected) {
  // Handle connection failure
}

3. Error Handling

Implement comprehensive error handling:

try {
  await agent.start();
} catch (e) {
  if (e.toString().contains('permission')) {
    // Handle microphone permission error
  } else if (e.toString().contains('network')) {
    // Handle network error
  } else {
    // Handle other errors
  }
}

4. State Management

Use proper state management for UI updates:

// Using setState for simple cases
eventCallback: (event) async {
  if (event is ServerInteractionConnectedEvent) {
    setState(() => _isConnected = true);
  }
}

// Or use a state management solution like Provider/Riverpod/Bloc

5. Handle App Lifecycle

Stop the agent when the app goes to background:

class _MyWidgetState extends State<MyWidget> with WidgetsBindingObserver {
  @override
  void initState() {
    super.initState();
    WidgetsBinding.instance.addObserver(this);
  }

  @override
  void dispose() {
    WidgetsBinding.instance.removeObserver(this);
    super.dispose();
  }

  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    if (state == AppLifecycleState.paused) {
      _agent?.stop();
    }
  }
}

Troubleshooting

Common Issues

1. Microphone Permission Denied

// Check and request permission before starting
import 'package:permission_handler/permission_handler.dart';

final status = await Permission.microphone.request();
if (!status.isGranted) {
  // Show dialog asking user to enable permission
}

2. Connection Timeout

Check internet connectivity
Verify API key is valid
Ensure org_id, workspace_id, and app_id are correct
Make sure the app has a committed version

3. No Audio Output

Check device volume
Verify audio interface is properly initialized
Ensure sample rate matches server configuration

4. WebSocket Disconnects

Implement reconnection logic
Handle app lifecycle (pause/resume)
Check for network changes

// Simple reconnection logic
eventCallback: (event) async {
  if (event is ServerInteractionEndEvent) {
    // Optionally reconnect
    await Future.delayed(const Duration(seconds: 2));
    await _startCall();
  }
}

Conversational AI SDKs

​Overview

​Installation

​Platform Setup

​iOS

​Android

​Quick Start

​Voice-to-Voice Conversation

​Text-Based Conversation

​SamvaadAgent

​Constructor Parameters

​Methods

​Future<void> start()

​Future<void> stop()

​Future<bool> waitForConnect({Duration? timeout})

​Future<void> waitForDisconnect()

​bool get isConnected

​String? get interactionId

​String? get referenceId

​Future<void> sendAudio(Uint8List audioData)

​Future<void> sendText(String text)

​Future<void> sendVoiceNote(Uint8List audioData, {bool transcribe = true})

​Configuration

​InteractionConfig

​Required Fields

​Optional Fields

​Example Configuration

​Audio Interfaces

​DefaultAudioInterface

​Custom Audio Interface

​Event Handling

​Text Callback

​Audio Callback

​Event Callback

​Supported Languages

​Proxy Server Setup

​Architecture

​Flutter Client Setup (with Proxy)

​Proxy Server Example (Python FastAPI)

​Security Best Practices

​1. Always Authenticate Users

​2. Rate Limiting

​3. Request Validation

​4. Environment Variables

​Message Types

​Server Message Types

​Client Message Types

​Best Practices

​1. Resource Cleanup

​2. Connection Timeout

​3. Error Handling

​4. State Management

​5. Handle App Lifecycle

​Troubleshooting

​Common Issues

​1. Microphone Permission Denied

​2. Connection Timeout

​3. No Audio Output

​4. WebSocket Disconnects

Overview

Installation

Platform Setup

iOS

Android

Quick Start

Voice-to-Voice Conversation

Text-Based Conversation

SamvaadAgent

Constructor Parameters

Methods

`Future<void> start()`

`Future<void> stop()`

`Future<bool> waitForConnect({Duration? timeout})`

`Future<void> waitForDisconnect()`

`bool get isConnected`

`String? get interactionId`

`String? get referenceId`

`Future<void> sendAudio(Uint8List audioData)`

`Future<void> sendText(String text)`

`Future<void> sendVoiceNote(Uint8List audioData, {bool transcribe = true})`

Configuration

InteractionConfig

Required Fields

Optional Fields

Example Configuration

Audio Interfaces

DefaultAudioInterface

Custom Audio Interface

Event Handling

Text Callback

Audio Callback

Event Callback

Supported Languages

Proxy Server Setup

Architecture

Flutter Client Setup (with Proxy)

Proxy Server Example (Python FastAPI)

Security Best Practices

1. Always Authenticate Users

2. Rate Limiting

3. Request Validation

4. Environment Variables

Message Types

Server Message Types

Client Message Types

Best Practices

1. Resource Cleanup

2. Connection Timeout

3. Error Handling

4. State Management

5. Handle App Lifecycle

Troubleshooting

Common Issues

1. Microphone Permission Denied

2. Connection Timeout

3. No Audio Output

4. WebSocket Disconnects