Web - Agent Docs

Guide for integrating the Sarvam Client SDK into browser-based React applications. For shared concepts like configuration, proxy setup, error handling, and API reference, see the Overview.

Installation

npm install sarvam-conv-ai-sdk

Use the /browser entry point for optimized bundle size:

import {
  ConversationAgent,
  BrowserAudioInterface,
} from "sarvam-conv-ai-sdk/browser";

Quick Start

Voice Conversation

import {
  ConversationAgent,
  BrowserAudioInterface,
  InteractionType,
} from "sarvam-conv-ai-sdk/browser";

const audioInterface = new BrowserAudioInterface();

const agent = new ConversationAgent({
  apiKey: "your_api_key",
  config: {
    org_id: "your_org_id",
    workspace_id: "your_workspace_id",
    app_id: "your_app_id",
    user_identifier: "user123",
    user_identifier_type: "custom",
    interaction_type: InteractionType.CALL,
    input_sample_rate: 16000,
    output_sample_rate: 16000,
  },
  audioInterface,
  transcriptCallback: async (msg) => {
    console.log(`${msg.role}: ${msg.content}`);
  },
});

await agent.start();
await agent.waitForConnect(10);

Text Conversation

import { ConversationAgent, InteractionType } from "sarvam-conv-ai-sdk/browser";

const agent = new ConversationAgent({
  apiKey: "your_api_key",
  config: {
    org_id: "your_org_id",
    workspace_id: "your_workspace_id",
    app_id: "your_app_id",
    user_identifier: "user@example.com",
    user_identifier_type: "email",
    interaction_type: InteractionType.CHAT,
    input_sample_rate: 16000,
    output_sample_rate: 16000,
  },
  textCallback: async (msg) => console.log("Agent:", msg.text),
});

await agent.start();
await agent.waitForConnect(10);
await agent.sendText("Hello!");

Audio Quality and Buffering

The BrowserAudioInterface uses adaptive buffering to handle network jitter. Adjust settings based on your network conditions:

const audioInterface = new BrowserAudioInterface(16000, {
  prebufferMs: 700, // Buffer before playback starts (default: 700ms)
  bufferSizeMs: 8000, // Total buffer capacity (default: 8 seconds)
});

Network Condition	prebufferMs	Notes
Good/stable	300-500	Lower latency, may cause gaps on jitter
Normal	700 (default)	Balanced latency and stability
Unstable	1000-1500	Higher latency, smoother playback

Audio Level Visualization

Use audioLevelCallback to get real-time audio levels (e.g., to show when the agent is speaking):

const agent = new ConversationAgent({
  // ... config
  audioLevelCallback: (level) => {
    updateAgentSpeakingIndicator(level);
  },
});

BrowserAudioInterface

new BrowserAudioInterface(
  sampleRate?: number,  // Default: 16000
  options?: {
    prebufferMs?: number;      // Default: 700
    bufferSizeMs?: number;     // Default: 8000
    outputLevelCallback?: (level: { rms: number; peak: number; db: number }) => void;
  }
)

Full Example

import React, { useRef, useState, useEffect } from 'react';
import {
  ConversationAgent,
  BrowserAudioInterface,
  InteractionType,
  AgentState,
  Role,
} from 'sarvam-conv-ai-sdk/browser';

interface Message {
  role: 'user' | 'bot';
  content: string;
}

function VoiceChat() {
  const [state, setState] = useState<AgentState>(AgentState.IDLE);
  const [messages, setMessages] = useState<Message[]>([]);
  const [isMuted, setIsMuted] = useState(false);
  const agentRef = useRef<ConversationAgent | null>(null);

  useEffect(() => {
    return () => {
      agentRef.current?.stop().catch(console.error);
    };
  }, []);

  const startConversation = async () => {
    const audioInterface = new BrowserAudioInterface(16000, {
      outputLevelCallback: (level) => {
        // Update volume visualization
      },
    });

    const agent = new ConversationAgent({
      apiKey: 'your_api_key',
      config: {
        org_id: 'your_org_id',
        workspace_id: 'your_workspace_id',
        app_id: 'your_app_id',
        user_identifier: 'user123',
        user_identifier_type: 'custom',
        interaction_type: InteractionType.CALL,
        input_sample_rate: 16000,
        output_sample_rate: 16000,
      },
      audioInterface,
      stateCallback: (newState) => setState(newState),
      transcriptCallback: async (msg) => {
        setMessages((prev) => [
          ...prev,
          { role: msg.role === Role.USER ? 'user' : 'bot', content: msg.content },
        ]);
      },
      endCallback: async () => {
        agentRef.current = null;
        setState(AgentState.IDLE);
      },
    });

    agentRef.current = agent;
    await agent.start();
  };

  const toggleMute = () => {
    if (!agentRef.current) return;
    if (agentRef.current.isMuted()) {
      agentRef.current.unmute();
      setIsMuted(false);
    } else {
      agentRef.current.mute();
      setIsMuted(true);
    }
  };

  return (
    <div>
      <p>State: {state}</p>
      <button onClick={startConversation} disabled={state !== AgentState.IDLE}>
        Start
      </button>
      <button onClick={() => agentRef.current?.stop()} disabled={state === AgentState.IDLE}>
        Stop
      </button>
      <button onClick={toggleMute} disabled={state === AgentState.IDLE}>
        {isMuted ? 'Unmute' : 'Mute'}
      </button>
      <div>
        {messages.map((msg, i) => (
          <p key={i}>
            <strong>{msg.role === 'user' ? 'You' : 'Agent'}:</strong> {msg.content}
          </p>
        ))}
      </div>
    </div>
  );
}

export default VoiceChat;

Troubleshooting

”audioInterface is required for CALL interactions”

Provide a BrowserAudioInterface for voice conversations:

import { BrowserAudioInterface } from "sarvam-conv-ai-sdk/browser";

const agent = new ConversationAgent({
  audioInterface: new BrowserAudioInterface(),
  // ...
});

Audio doesn’t play in Firefox

Firefox requires user interaction before playing audio. Call start() from a click handler:

button.onclick = async () => {
  await agent.start();
};

HTTPS required for microphone

Microphone access requires HTTPS in production. Use localhost for development.

Handling Microphone Permission Errors

Browser microphone access can fail for several reasons:

try {
  const audioInterface = new BrowserAudioInterface();
  const agent = new ConversationAgent({
    audioInterface,
    // ...
  });
  await agent.start();
} catch (err) {
  if (err.name === "NotAllowedError") {
    // User denied permission
    console.error("Microphone permission denied");
  } else if (err.name === "NotFoundError") {
    // No microphone available
    console.error("No microphone found");
  } else if (err.name === "NotReadableError") {
    // Microphone in use by another app
    console.error("Microphone is in use");
  }
}

​Installation

​Quick Start

​Voice Conversation

​Text Conversation

​Audio Quality and Buffering

​Audio Level Visualization

​BrowserAudioInterface

​Full Example

​Troubleshooting

​”audioInterface is required for CALL interactions”

​Audio doesn’t play in Firefox

​HTTPS required for microphone

​Handling Microphone Permission Errors