The Sarvam Client SDKs provide a lightweight, event-driven interface for building real-time voice and text conversations. They handle WebSocket connectivity, audio streaming, buffering, and state management out of the box — so you can focus on your application logic.

Configuration

The configuration and API reference on this page apply to the TypeScript SDK (Web and React Native). For Flutter/Dart equivalents, see the Flutter page.

Required Fields

FieldTypeDescription
org_idstringYour organization ID
workspace_idstringYour workspace ID
app_idstringThe target application ID
user_identifierstringUser identifier (email, phone, or custom ID)
user_identifier_typestringOne of: 'custom', 'email', 'phone_number', 'unknown'
interaction_typeInteractionTypeInteractionType.CALL (voice) or InteractionType.CHAT (text)
input_sample_ratenumberAudio input rate: 8000 or 16000 Hz
output_sample_ratenumberAudio output rate: 16000 or 22050 Hz

Optional Fields

FieldTypeDescription
versionnumberApp version. If omitted, uses latest committed version
agent_variablesRecord<string, any>Key-value pairs to seed agent context
initial_language_nameSarvamToolLanguageNameStarting language
initial_state_namestringStarting state name
initial_bot_messagestringFirst message from the agent

Using a Proxy Server

We strongly recommend using a proxy server for production deployments to keep your API key secure and enforce access control. See the Proxy Setup guide for server configuration, client examples for each platform, and security best practices.

Mute and Unmute

Control microphone input without disconnecting:
// Mute - sends silence to keep connection stable
agent.mute();

// Unmute - resumes sending real audio
agent.unmute();

// Check current state
if (agent.isMuted()) {
  showMutedIcon();
}
When muted, the SDK sends continuous silence chunks to keep VAD (Voice Activity Detection) stable, preventing connection issues.

Custom Events via Transcript

Backend tools can send custom events to the frontend through the transcriptCallback. These messages appear as bot transcripts with special formats that your app can intercept and handle.
Backend Requirement: Tools must emit ServerToolEvent.NotifyUI events with message_to_user containing the message content. See your backend SDK documentation for implementation details.

Thinking State

The backend sends thinking_started and thinking_completed messages when tools are processing:
const agent = new ConversationAgent({
  // ... config
  transcriptCallback: async (msg) => {
    // Handle thinking state
    if (msg.content === "thinking_started") {
      showLoadingSpinner();
      return;
    }
    if (msg.content === "thinking_completed") {
      hideLoadingSpinner();
      return;
    }

    // Regular transcript
    addToChat(msg.role, msg.content);
  },
});

MCP Widgets

MCP (Model Context Protocol) Apps enable interactive widgets in your conversation UI. The backend sends widget URLs with data, and your frontend renders them in iframes.

Message Format

The backend sends messages in this format:
mcp::<widget_url>::<widget_data_json>
Example:
mcp::https://widgets.example.com/food-cart.html::{"items":[...],"total":450}

Parsing Widget Messages

interface CurrentWidget {
  url: string;
  data: Record<string, unknown>;
}

const [currentWidget, setCurrentWidget] = useState<CurrentWidget | null>(null);
const currentWidgetRef = useRef<CurrentWidget | null>(null);
const iframeRef = useRef<HTMLIFrameElement | null>(null);

const agent = new ConversationAgent({
  // ... config
  transcriptCallback: async (msg) => {
    // Handle MCP widget messages: mcp::<url>::<data>
    if (msg.role === "bot" && msg.content.startsWith("mcp::")) {
      const parts = msg.content.slice(5).split("::");
      if (parts.length >= 2) {
        const widgetUrl = parts[0];
        const widgetData = JSON.parse(parts.slice(1).join("::"));
        const widget = { url: widgetUrl, data: widgetData };
        currentWidgetRef.current = widget;
        setCurrentWidget(widget);
        return;
      }
    }

    // Regular transcript
    addToChat(msg.role, msg.content);
  },
});

Initializing the Widget

Widgets expect initialization data via postMessage when the iframe loads:
const handleIframeLoad = useCallback(() => {
  const widget = currentWidgetRef.current;
  if (!iframeRef.current?.contentWindow || !widget) return;

  iframeRef.current.contentWindow.postMessage(
    {
      jsonrpc: "2.0",
      method: "ui/initialize",
      params: {
        toolOutput: widget.data,
        data: widget.data,
        theme: "dark",
        hostContext: { theme: "dark" },
      },
    },
    "*",
  );
}, []);

Rendering the Widget

{
  currentWidget && (
    <iframe
      ref={iframeRef}
      src={currentWidget.url}
      sandbox="allow-scripts allow-same-origin"
      onLoad={handleIframeLoad}
    />
  );
}

Agent State Tracking

Monitor the agent’s state to update your UI:
const agent = new ConversationAgent({
  // ... config
  stateCallback: (newState, previousState) => {
    switch (newState) {
      case AgentState.CONNECTING:
        showSpinner();
        break;
      case AgentState.LISTENING:
        showListeningIndicator();
        break;
      case AgentState.SPEAKING:
        showSpeakingAnimation();
        break;
      case AgentState.ERROR:
        showErrorMessage();
        break;
    }
  },
});

// Or check programmatically
const currentState = agent.getState();
StateDescription
IDLENot connected
CONNECTINGEstablishing connection
CONNECTEDConnected, not yet ready
LISTENINGListening for user input
SPEAKINGAgent is responding
ERRORAn error occurred

Transcripts and Chat History

Use transcriptCallback for voice conversation transcripts:
const agent = new ConversationAgent({
  // ... config (InteractionType.CALL)
  transcriptCallback: async (msg) => {
    if (msg.role === Role.USER) {
      addToChat("You", msg.content);
    } else if (msg.role === Role.BOT) {
      addToChat("Agent", msg.content);
    }
  },
});
Use textCallback for text chat agent responses:
const agent = new ConversationAgent({
  // ... config (InteractionType.CHAT)
  textCallback: async (msg) => {
    addToChat("Agent", msg.text);
  },
});
CallbackUse CaseInteraction Type
transcriptCallbackVoice transcripts (user speech + agent speech)CALL
textCallbackText chat agent responsesCHAT

Analytics and Telemetry

Track SDK events for monitoring and debugging. Here’s an example using New Relic:
import newrelic from "newrelic";

const SKIP_EVENTS = new Set(["user_interrupted", "state_changed"]);

const agent = new ConversationAgent({
  // ... config
  telemetryCallback: (event) => {
    if (SKIP_EVENTS.has(event.name)) return;

    newrelic.recordCustomEvent(`SarvamSDK_${event.name}`, {
      ...event.properties,
      sessionId: event.sessionId,
      interactionId: event.interactionId,
      timestamp: event.timestamp,
    });
  },
});
EventDescription
ws_connectingWebSocket connection starting
ws_connectedConnected (includes durationMs)
ws_disconnectedDisconnected (includes code, reason)
ws_errorWebSocket error
session_startedConversation began
session_endedSession ended (includes durationMs, initiatedBy)
interaction_connectedInteraction established (includes interactionId)
first_audio_packet_sentFirst audio sent (includes latencyMs)
first_audio_packet_receivedFirst audio received (includes latencyMs)
user_mutedUser muted microphone
user_unmutedUser unmuted
user_interruptedUser interrupted agent
state_changedState changed (includes from, to)
network_offlineNetwork lost
network_onlineNetwork restored
errorError occurred (includes type, message)

Connection Lifecycle

// Start the conversation
await agent.start();

// Wait for connection (timeout in seconds)
const connected = await agent.waitForConnect(10);
if (!connected) {
  console.error("Connection timeout");
  return;
}

// Check connection status
if (agent.isConnected()) {
  console.log("Connected");
}

// Get the interaction ID
const interactionId = agent.getInteractionId();

// Set a reference ID (e.g., telephony Call SID)
agent.reference_id = "CA1234567890abcdef";

// Wait for disconnection (blocks until conversation ends)
await agent.waitForDisconnect();

// Stop and cleanup
await agent.stop();
Reconnection is not supported. Each WebSocket URL is single-use and expires after the first connection. If the connection drops, you must call agent.stop() and create a new ConversationAgent instance to start a fresh session.

Supported Languages

const config = {
  // ... other config
  initial_language_name: SarvamToolLanguageName.HINDI,
};
Available languages:
  • BENGALI, GUJARATI, KANNADA, MALAYALAM, TAMIL
  • TELUGU, PUNJABI, ODIA, MARATHI, HINDI, ENGLISH

API Reference

ConversationAgent Constructor

new ConversationAgent({
  apiKey: string;
  config: InteractionConfig;
  audioInterface?: AsyncAudioInterface;  // Required for CALL mode
  textCallback?: (msg) => Promise<void>;
  transcriptCallback?: (msg) => Promise<void>;
  audioCallback?: (msg) => Promise<void>;
  audioLevelCallback?: (level) => void;
  eventCallback?: (event) => Promise<void>;
  stateCallback?: (newState, prevState) => void;
  telemetryCallback?: (event) => void;
  startCallback?: () => Promise<void>;
  endCallback?: () => Promise<void>;
  baseUrl?: string;
  platform?: 'browser' | 'react-native';
})

ConversationAgent Methods

MethodReturnsDescription
start()Promise<void>Start the conversation
stop()Promise<void>Stop and cleanup
waitForConnect(timeout?)Promise<boolean>Wait for connection
waitForDisconnect()Promise<void>Wait for disconnection
isConnected()booleanCheck connection status
getInteractionId()string | undefinedGet interaction ID
getState()AgentStateGet current agent state
getAgentType()'voice' | 'text'Get agent type
sendText(text)Promise<void>Send text (CHAT mode only)
sendAudio(data)Promise<void>Send audio (CALL mode only)
mute()voidMute microphone (CALL mode only)
unmute()voidUnmute microphone (CALL mode only)
isMuted()booleanCheck mute status
isVoiceAgent()booleanType guard for voice agent
isTextAgent()booleanType guard for text agent

Error Handling

The SDK provides typed error classes for different failure scenarios:
try {
  await agent.start();
} catch (err) {
  if (err instanceof RateLimitError) {
    console.log(`Rate limited. Retry after ${err.retryAfter} seconds`);
    await new Promise((r) => setTimeout(r, err.retryAfter * 1000));
    await agent.start();
  } else if (err instanceof AuthenticationError) {
    console.error("Invalid API key");
  } else if (err instanceof ForbiddenError) {
    console.error("Access denied - check your permissions");
  } else if (err instanceof NotFoundError) {
    console.error("App not found - check app_id");
  } else if (err instanceof ServerError) {
    console.error("Server error - try again later");
  } else if (err instanceof SDKError) {
    console.error(`SDK error [${err.code}]: ${err.message}`);
  } else {
    console.error("Unexpected error:", err);
  }
}

Error Types

Error ClassStatus CodeWhen It Occurs
RateLimitError429Too many requests. Has retryAfter property (seconds).
AuthenticationError401Invalid or missing API key
ForbiddenError403Valid API key but insufficient permissions
NotFoundError404App, workspace, or org not found
ServerError5xxServer-side error
SDKErrorBase class for all SDK errors
All errors extend SDKError which has:
  • message - Human-readable error description
  • code - Programmatic error code (e.g., "RATE_LIMITED", "AUTHENTICATION_FAILED")
  • statusCode - HTTP status code if applicable

Handling Connection Errors

Use the telemetryCallback to monitor connection issues:
const agent = new ConversationAgent({
  // ... config
  telemetryCallback: (event) => {
    if (event.name === "ws_error") {
      console.error("WebSocket error:", event.properties.error);
    }
    if (event.name === "ws_disconnected") {
      console.log("Disconnected:", event.properties.reason);
    }
    if (event.name === "network_offline") {
      showOfflineWarning();
    }
    if (event.name === "network_online") {
      hideOfflineWarning();
    }
  },
});

Troubleshooting

Connection timeout

Ensure your app_id has a committed version. If no version is specified, the SDK uses the latest committed version—if none exists, the connection fails.