The Sarvam Client SDKs provide a lightweight, event-driven interface for building real-time voice and text conversations. They handle WebSocket connectivity, audio streaming, buffering, and state management out of the box — so you can focus on your application logic.
Configuration
The configuration and API reference on this page apply to the TypeScript SDK (Web and React Native). For Flutter/Dart equivalents, see the Flutter page.
Required Fields
| Field | Type | Description |
|---|
| org_id | string | Your organization ID |
| workspace_id | string | Your workspace ID |
| app_id | string | The target application ID |
| user_identifier | string | User identifier (email, phone, or custom ID) |
| user_identifier_type | string | One of: 'custom', 'email', 'phone_number', 'unknown' |
| interaction_type | InteractionType | InteractionType.CALL (voice) or InteractionType.CHAT (text) |
| input_sample_rate | number | Audio input rate: 8000 or 16000 Hz |
| output_sample_rate | number | Audio output rate: 16000 or 22050 Hz |
Optional Fields
| Field | Type | Description |
|---|
| version | number | App version. If omitted, uses latest committed version |
| agent_variables | Record<string, any> | Key-value pairs to seed agent context |
| initial_language_name | SarvamToolLanguageName | Starting language |
| initial_state_name | string | Starting state name |
| initial_bot_message | string | First message from the agent |
Using a Proxy Server
We strongly recommend using a proxy server for production deployments to keep your API key secure and enforce access control. See the Proxy Setup guide for server configuration, client examples for each platform, and security best practices.
Mute and Unmute
Control microphone input without disconnecting:
// Mute - sends silence to keep connection stable
agent.mute();
// Unmute - resumes sending real audio
agent.unmute();
// Check current state
if (agent.isMuted()) {
showMutedIcon();
}
When muted, the SDK sends continuous silence chunks to keep VAD (Voice Activity Detection) stable, preventing connection issues.
Custom Events via Transcript
Backend tools can send custom events to the frontend through the transcriptCallback. These messages appear as bot transcripts with special formats that your app can intercept and handle.
Backend Requirement: Tools must emit ServerToolEvent.NotifyUI events with message_to_user containing the message content. See your backend SDK documentation for implementation details.
Thinking State
The backend sends thinking_started and thinking_completed messages when tools are processing:
const agent = new ConversationAgent({
// ... config
transcriptCallback: async (msg) => {
// Handle thinking state
if (msg.content === "thinking_started") {
showLoadingSpinner();
return;
}
if (msg.content === "thinking_completed") {
hideLoadingSpinner();
return;
}
// Regular transcript
addToChat(msg.role, msg.content);
},
});
MCP (Model Context Protocol) Apps enable interactive widgets in your conversation UI. The backend sends widget URLs with data, and your frontend renders them in iframes.
The backend sends messages in this format:
mcp::<widget_url>::<widget_data_json>
Example:
mcp::https://widgets.example.com/food-cart.html::{"items":[...],"total":450}
interface CurrentWidget {
url: string;
data: Record<string, unknown>;
}
const [currentWidget, setCurrentWidget] = useState<CurrentWidget | null>(null);
const currentWidgetRef = useRef<CurrentWidget | null>(null);
const iframeRef = useRef<HTMLIFrameElement | null>(null);
const agent = new ConversationAgent({
// ... config
transcriptCallback: async (msg) => {
// Handle MCP widget messages: mcp::<url>::<data>
if (msg.role === "bot" && msg.content.startsWith("mcp::")) {
const parts = msg.content.slice(5).split("::");
if (parts.length >= 2) {
const widgetUrl = parts[0];
const widgetData = JSON.parse(parts.slice(1).join("::"));
const widget = { url: widgetUrl, data: widgetData };
currentWidgetRef.current = widget;
setCurrentWidget(widget);
return;
}
}
// Regular transcript
addToChat(msg.role, msg.content);
},
});
Widgets expect initialization data via postMessage when the iframe loads:
const handleIframeLoad = useCallback(() => {
const widget = currentWidgetRef.current;
if (!iframeRef.current?.contentWindow || !widget) return;
iframeRef.current.contentWindow.postMessage(
{
jsonrpc: "2.0",
method: "ui/initialize",
params: {
toolOutput: widget.data,
data: widget.data,
theme: "dark",
hostContext: { theme: "dark" },
},
},
"*",
);
}, []);
{
currentWidget && (
<iframe
ref={iframeRef}
src={currentWidget.url}
sandbox="allow-scripts allow-same-origin"
onLoad={handleIframeLoad}
/>
);
}
Agent State Tracking
Monitor the agent’s state to update your UI:
const agent = new ConversationAgent({
// ... config
stateCallback: (newState, previousState) => {
switch (newState) {
case AgentState.CONNECTING:
showSpinner();
break;
case AgentState.LISTENING:
showListeningIndicator();
break;
case AgentState.SPEAKING:
showSpeakingAnimation();
break;
case AgentState.ERROR:
showErrorMessage();
break;
}
},
});
// Or check programmatically
const currentState = agent.getState();
| State | Description |
|---|
IDLE | Not connected |
CONNECTING | Establishing connection |
CONNECTED | Connected, not yet ready |
LISTENING | Listening for user input |
SPEAKING | Agent is responding |
ERROR | An error occurred |
Transcripts and Chat History
Use transcriptCallback for voice conversation transcripts:
const agent = new ConversationAgent({
// ... config (InteractionType.CALL)
transcriptCallback: async (msg) => {
if (msg.role === Role.USER) {
addToChat("You", msg.content);
} else if (msg.role === Role.BOT) {
addToChat("Agent", msg.content);
}
},
});
Use textCallback for text chat agent responses:
const agent = new ConversationAgent({
// ... config (InteractionType.CHAT)
textCallback: async (msg) => {
addToChat("Agent", msg.text);
},
});
| Callback | Use Case | Interaction Type |
|---|
transcriptCallback | Voice transcripts (user speech + agent speech) | CALL |
textCallback | Text chat agent responses | CHAT |
Analytics and Telemetry
Track SDK events for monitoring and debugging. Here’s an example using New Relic:
import newrelic from "newrelic";
const SKIP_EVENTS = new Set(["user_interrupted", "state_changed"]);
const agent = new ConversationAgent({
// ... config
telemetryCallback: (event) => {
if (SKIP_EVENTS.has(event.name)) return;
newrelic.recordCustomEvent(`SarvamSDK_${event.name}`, {
...event.properties,
sessionId: event.sessionId,
interactionId: event.interactionId,
timestamp: event.timestamp,
});
},
});
| Event | Description |
|---|
ws_connecting | WebSocket connection starting |
ws_connected | Connected (includes durationMs) |
ws_disconnected | Disconnected (includes code, reason) |
ws_error | WebSocket error |
session_started | Conversation began |
session_ended | Session ended (includes durationMs, initiatedBy) |
interaction_connected | Interaction established (includes interactionId) |
first_audio_packet_sent | First audio sent (includes latencyMs) |
first_audio_packet_received | First audio received (includes latencyMs) |
user_muted | User muted microphone |
user_unmuted | User unmuted |
user_interrupted | User interrupted agent |
state_changed | State changed (includes from, to) |
network_offline | Network lost |
network_online | Network restored |
error | Error occurred (includes type, message) |
Connection Lifecycle
// Start the conversation
await agent.start();
// Wait for connection (timeout in seconds)
const connected = await agent.waitForConnect(10);
if (!connected) {
console.error("Connection timeout");
return;
}
// Check connection status
if (agent.isConnected()) {
console.log("Connected");
}
// Get the interaction ID
const interactionId = agent.getInteractionId();
// Set a reference ID (e.g., telephony Call SID)
agent.reference_id = "CA1234567890abcdef";
// Wait for disconnection (blocks until conversation ends)
await agent.waitForDisconnect();
// Stop and cleanup
await agent.stop();
Reconnection is not supported. Each WebSocket URL is single-use and
expires after the first connection. If the connection drops, you must call
agent.stop() and create a new ConversationAgent instance to start a fresh
session.
Supported Languages
const config = {
// ... other config
initial_language_name: SarvamToolLanguageName.HINDI,
};
Available languages:
BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL
TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH
API Reference
ConversationAgent Constructor
new ConversationAgent({
apiKey: string;
config: InteractionConfig;
audioInterface?: AsyncAudioInterface; // Required for CALL mode
textCallback?: (msg) => Promise<void>;
transcriptCallback?: (msg) => Promise<void>;
audioCallback?: (msg) => Promise<void>;
audioLevelCallback?: (level) => void;
eventCallback?: (event) => Promise<void>;
stateCallback?: (newState, prevState) => void;
telemetryCallback?: (event) => void;
startCallback?: () => Promise<void>;
endCallback?: () => Promise<void>;
baseUrl?: string;
platform?: 'browser' | 'react-native';
})
ConversationAgent Methods
| Method | Returns | Description |
|---|
start() | Promise<void> | Start the conversation |
stop() | Promise<void> | Stop and cleanup |
waitForConnect(timeout?) | Promise<boolean> | Wait for connection |
waitForDisconnect() | Promise<void> | Wait for disconnection |
isConnected() | boolean | Check connection status |
getInteractionId() | string | undefined | Get interaction ID |
getState() | AgentState | Get current agent state |
getAgentType() | 'voice' | 'text' | Get agent type |
sendText(text) | Promise<void> | Send text (CHAT mode only) |
sendAudio(data) | Promise<void> | Send audio (CALL mode only) |
mute() | void | Mute microphone (CALL mode only) |
unmute() | void | Unmute microphone (CALL mode only) |
isMuted() | boolean | Check mute status |
isVoiceAgent() | boolean | Type guard for voice agent |
isTextAgent() | boolean | Type guard for text agent |
Error Handling
The SDK provides typed error classes for different failure scenarios:
try {
await agent.start();
} catch (err) {
if (err instanceof RateLimitError) {
console.log(`Rate limited. Retry after ${err.retryAfter} seconds`);
await new Promise((r) => setTimeout(r, err.retryAfter * 1000));
await agent.start();
} else if (err instanceof AuthenticationError) {
console.error("Invalid API key");
} else if (err instanceof ForbiddenError) {
console.error("Access denied - check your permissions");
} else if (err instanceof NotFoundError) {
console.error("App not found - check app_id");
} else if (err instanceof ServerError) {
console.error("Server error - try again later");
} else if (err instanceof SDKError) {
console.error(`SDK error [${err.code}]: ${err.message}`);
} else {
console.error("Unexpected error:", err);
}
}
Error Types
| Error Class | Status Code | When It Occurs |
|---|
RateLimitError | 429 | Too many requests. Has retryAfter property (seconds). |
AuthenticationError | 401 | Invalid or missing API key |
ForbiddenError | 403 | Valid API key but insufficient permissions |
NotFoundError | 404 | App, workspace, or org not found |
ServerError | 5xx | Server-side error |
SDKError | — | Base class for all SDK errors |
All errors extend SDKError which has:
message - Human-readable error description
code - Programmatic error code (e.g., "RATE_LIMITED", "AUTHENTICATION_FAILED")
statusCode - HTTP status code if applicable
Handling Connection Errors
Use the telemetryCallback to monitor connection issues:
const agent = new ConversationAgent({
// ... config
telemetryCallback: (event) => {
if (event.name === "ws_error") {
console.error("WebSocket error:", event.properties.error);
}
if (event.name === "ws_disconnected") {
console.log("Disconnected:", event.properties.reason);
}
if (event.name === "network_offline") {
showOfflineWarning();
}
if (event.name === "network_online") {
hideOfflineWarning();
}
},
});
Troubleshooting
Connection timeout
Ensure your app_id has a committed version. If no version is specified, the SDK uses the latest committed version—if none exists, the connection fails.