The Sarvam Client SDKs provide a lightweight, event-driven interface for building real-time voice and text conversations. They handle WebSocket connectivity, audio streaming, buffering, and state management out of the box — so you can focus on your application logic.
Web For browser-based web apps using React.
React Native For iOS and Android apps using React Native.
Flutter For cross-platform mobile apps using Flutter.
Configuration
The configuration and API reference on this page apply to the TypeScript SDK (Web and React Native). For Flutter/Dart equivalents, see the Flutter page.
Required Fields
Field Type Description org_id string Your organization ID workspace_id string Your workspace ID app_id string The target application ID user_identifier string User identifier (email, phone, or custom ID) user_identifier_type string One of: 'custom', 'email', 'phone_number', 'unknown' interaction_type InteractionType InteractionType.CALL (voice) or InteractionType.CHAT (text)input_sample_rate number Audio input rate: 8000 or 16000 Hz output_sample_rate number Audio output rate: 16000 or 22050 Hz
Optional Fields
Field Type Description version number App version. If omitted, uses latest committed version agent_variables Record<string, any> Key-value pairs to seed agent context initial_language_name SarvamToolLanguageName Starting language initial_state_name string Starting state name initial_bot_message string First message from the agent
Using a Proxy Server
We strongly recommend using a proxy server for production deployments to keep your API key secure and enforce access control. See the Proxy Setup guide for server configuration, client examples for each platform, and security best practices.
Mute and Unmute
Control microphone input without disconnecting:
// Mute - sends silence to keep connection stable
agent . mute ();
// Unmute - resumes sending real audio
agent . unmute ();
// Check current state
if ( agent . isMuted ()) {
showMutedIcon ();
}
When muted, the SDK sends continuous silence chunks to keep VAD (Voice Activity Detection) stable, preventing connection issues.
Custom Events via Transcript
Backend tools can send custom events to the frontend through the transcriptCallback. These messages appear as bot transcripts with special formats that your app can intercept and handle.
Backend Requirement: Tools must emit ServerToolEvent.NotifyUI events with message_to_user containing the message content. See your backend SDK documentation for implementation details.
Thinking State
The backend sends thinking_started and thinking_completed messages when tools are processing:
const agent = new ConversationAgent ({
// ... config
transcriptCallback : async ( msg ) => {
// Handle thinking state
if ( msg . content === "thinking_started" ) {
showLoadingSpinner ();
return ;
}
if ( msg . content === "thinking_completed" ) {
hideLoadingSpinner ();
return ;
}
// Regular transcript
addToChat ( msg . role , msg . content );
},
});
MCP (Model Context Protocol) Apps enable interactive widgets in your conversation UI. The backend sends widget URLs with data, and your frontend renders them in iframes.
The backend sends messages in this format:
mcp::<widget_url>::<widget_data_json>
Example:
mcp::https://widgets.example.com/food-cart.html::{"items":[...],"total":450}
interface CurrentWidget {
url : string ;
data : Record < string , unknown >;
}
const [ currentWidget , setCurrentWidget ] = useState < CurrentWidget | null >( null );
const currentWidgetRef = useRef < CurrentWidget | null >( null );
const iframeRef = useRef < HTMLIFrameElement | null >( null );
const agent = new ConversationAgent ({
// ... config
transcriptCallback : async ( msg ) => {
// Handle MCP widget messages: mcp::<url>::<data>
if ( msg . role === "bot" && msg . content . startsWith ( "mcp::" )) {
const parts = msg . content . slice ( 5 ). split ( "::" );
if ( parts . length >= 2 ) {
const widgetUrl = parts [ 0 ];
const widgetData = JSON . parse ( parts . slice ( 1 ). join ( "::" ));
const widget = { url: widgetUrl , data: widgetData };
currentWidgetRef . current = widget ;
setCurrentWidget ( widget );
return ;
}
}
// Regular transcript
addToChat ( msg . role , msg . content );
},
});
Widgets expect initialization data via postMessage when the iframe loads:
const handleIframeLoad = useCallback (() => {
const widget = currentWidgetRef . current ;
if ( ! iframeRef . current ?. contentWindow || ! widget ) return ;
iframeRef . current . contentWindow . postMessage (
{
jsonrpc: "2.0" ,
method: "ui/initialize" ,
params: {
toolOutput: widget . data ,
data: widget . data ,
theme: "dark" ,
hostContext: { theme: "dark" },
},
},
"*" ,
);
}, []);
{
currentWidget && (
< iframe
ref = { iframeRef }
src = { currentWidget . url }
sandbox = "allow-scripts allow-same-origin"
onLoad = { handleIframeLoad }
/>
);
}
Agent State Tracking
Monitor the agent’s state to update your UI:
const agent = new ConversationAgent ({
// ... config
stateCallback : ( newState , previousState ) => {
switch ( newState ) {
case AgentState . CONNECTING :
showSpinner ();
break ;
case AgentState . LISTENING :
showListeningIndicator ();
break ;
case AgentState . SPEAKING :
showSpeakingAnimation ();
break ;
case AgentState . ERROR :
showErrorMessage ();
break ;
}
},
});
// Or check programmatically
const currentState = agent . getState ();
State Description IDLENot connected CONNECTINGEstablishing connection CONNECTEDConnected, not yet ready LISTENINGListening for user input SPEAKINGAgent is responding ERRORAn error occurred
Transcripts and Chat History
Use transcriptCallback for voice conversation transcripts:
const agent = new ConversationAgent ({
// ... config (InteractionType.CALL)
transcriptCallback : async ( msg ) => {
if ( msg . role === Role . USER ) {
addToChat ( "You" , msg . content );
} else if ( msg . role === Role . BOT ) {
addToChat ( "Agent" , msg . content );
}
},
});
Use textCallback for text chat agent responses:
const agent = new ConversationAgent ({
// ... config (InteractionType.CHAT)
textCallback : async ( msg ) => {
addToChat ( "Agent" , msg . text );
},
});
Callback Use Case Interaction Type transcriptCallbackVoice transcripts (user speech + agent speech) CALLtextCallbackText chat agent responses CHAT
Analytics and Telemetry
Track SDK events for monitoring and debugging. Here’s an example using New Relic:
import newrelic from "newrelic" ;
const SKIP_EVENTS = new Set ([ "user_interrupted" , "state_changed" ]);
const agent = new ConversationAgent ({
// ... config
telemetryCallback : ( event ) => {
if ( SKIP_EVENTS . has ( event . name )) return ;
newrelic . recordCustomEvent ( `SarvamSDK_ ${ event . name } ` , {
... event . properties ,
sessionId: event . sessionId ,
interactionId: event . interactionId ,
timestamp: event . timestamp ,
});
},
});
Event Description ws_connectingWebSocket connection starting ws_connectedConnected (includes durationMs) ws_disconnectedDisconnected (includes code, reason) ws_errorWebSocket error session_startedConversation began session_endedSession ended (includes durationMs, initiatedBy) interaction_connectedInteraction established (includes interactionId) first_audio_packet_sentFirst audio sent (includes latencyMs) first_audio_packet_receivedFirst audio received (includes latencyMs) user_mutedUser muted microphone user_unmutedUser unmuted user_interruptedUser interrupted agent state_changedState changed (includes from, to) network_offlineNetwork lost network_onlineNetwork restored errorError occurred (includes type, message)
Connection Lifecycle
// Start the conversation
await agent . start ();
// Wait for connection (timeout in seconds)
const connected = await agent . waitForConnect ( 10 );
if ( ! connected ) {
console . error ( "Connection timeout" );
return ;
}
// Check connection status
if ( agent . isConnected ()) {
console . log ( "Connected" );
}
// Get the interaction ID
const interactionId = agent . getInteractionId ();
// Set a reference ID (e.g., telephony Call SID)
agent . reference_id = "CA1234567890abcdef" ;
// Wait for disconnection (blocks until conversation ends)
await agent . waitForDisconnect ();
// Stop and cleanup
await agent . stop ();
Reconnection is not supported. Each WebSocket URL is single-use and
expires after the first connection. If the connection drops, you must call
agent.stop() and create a new ConversationAgent instance to start a fresh
session.
Supported Languages
const config = {
// ... other config
initial_language_name: SarvamToolLanguageName . HINDI ,
};
Available languages:
BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL
TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH
API Reference
ConversationAgent Constructor
new ConversationAgent ({
apiKey: string ;
config : InteractionConfig ;
audioInterface ?: AsyncAudioInterface ; // Required for CALL mode
textCallback ?: ( msg ) => Promise < void > ;
transcriptCallback ?: ( msg ) => Promise < void > ;
audioCallback ?: ( msg ) => Promise < void > ;
audioLevelCallback ?: ( level ) => void ;
eventCallback ?: ( event ) => Promise < void > ;
stateCallback ?: ( newState , prevState ) => void ;
telemetryCallback ?: ( event ) => void ;
startCallback ?: () => Promise < void > ;
endCallback ?: () => Promise < void > ;
baseUrl ?: string ;
platform ?: 'browser' | 'react-native' ;
})
ConversationAgent Methods
Method Returns Description start()Promise<void>Start the conversation stop()Promise<void>Stop and cleanup waitForConnect(timeout?)Promise<boolean>Wait for connection waitForDisconnect()Promise<void>Wait for disconnection isConnected()booleanCheck connection status getInteractionId()string | undefinedGet interaction ID getState()AgentStateGet current agent state getAgentType()'voice' | 'text'Get agent type sendText(text)Promise<void>Send text (CHAT mode only) sendAudio(data)Promise<void>Send audio (CALL mode only) mute()voidMute microphone (CALL mode only) unmute()voidUnmute microphone (CALL mode only) isMuted()booleanCheck mute status isVoiceAgent()booleanType guard for voice agent isTextAgent()booleanType guard for text agent
Error Handling
The SDK provides typed error classes for different failure scenarios:
try {
await agent . start ();
} catch ( err ) {
if ( err instanceof RateLimitError ) {
console . log ( `Rate limited. Retry after ${ err . retryAfter } seconds` );
await new Promise (( r ) => setTimeout ( r , err . retryAfter * 1000 ));
await agent . start ();
} else if ( err instanceof AuthenticationError ) {
console . error ( "Invalid API key" );
} else if ( err instanceof ForbiddenError ) {
console . error ( "Access denied - check your permissions" );
} else if ( err instanceof NotFoundError ) {
console . error ( "App not found - check app_id" );
} else if ( err instanceof ServerError ) {
console . error ( "Server error - try again later" );
} else if ( err instanceof SDKError ) {
console . error ( `SDK error [ ${ err . code } ]: ${ err . message } ` );
} else {
console . error ( "Unexpected error:" , err );
}
}
Error Types
Error Class Status Code When It Occurs RateLimitError429 Too many requests. Has retryAfter property (seconds). AuthenticationError401 Invalid or missing API key ForbiddenError403 Valid API key but insufficient permissions NotFoundError404 App, workspace, or org not found ServerError5xx Server-side error SDKError— Base class for all SDK errors
All errors extend SDKError which has:
message - Human-readable error description
code - Programmatic error code (e.g., "RATE_LIMITED", "AUTHENTICATION_FAILED")
statusCode - HTTP status code if applicable
Handling Connection Errors
Use the telemetryCallback to monitor connection issues:
const agent = new ConversationAgent ({
// ... config
telemetryCallback : ( event ) => {
if ( event . name === "ws_error" ) {
console . error ( "WebSocket error:" , event . properties . error );
}
if ( event . name === "ws_disconnected" ) {
console . log ( "Disconnected:" , event . properties . reason );
}
if ( event . name === "network_offline" ) {
showOfflineWarning ();
}
if ( event . name === "network_online" ) {
hideOfflineWarning ();
}
},
});
Troubleshooting
Connection timeout
Ensure your app_id has a committed version. If no version is specified, the SDK uses the latest committed version—if none exists, the connection fails.