The Sarvam Conversational AI SDK is a Python package that helps developers build and extend conversational agents. It provides core components to manage conversation flow, language preferences, and messaging, making it easier to develop interactive and context-aware AI experiences.

Overview

The Sarvam Conv AI SDK enables developers to create tools that can:
  • Facilitate agentic capabilities like API calling in the middle of a conversation.
  • Manage agent-specific variables
  • Control and modify the language used during conversations
  • Send dynamic messages to both the user and the underlying language model (LLM)

Installation

Basic Installation

Install the SDK via pip:
pip install sarvam-conv-ai-sdk

Audio Support (Optional)

If you want to use audio streaming features (microphone input and speaker output), you need to install PyAudio. This requires system-level dependencies:

Option 1: Install with audio support

pip install sarvam-conv-ai-sdk[all]
Note: You’ll need to install PortAudio first:

Option 2: Use without PyAudio

The SDK works without PyAudio for non-playback environments; audio capture/playback features will not be available. You can still:
  • Use the WebSocket client for real-time voice conversations (provide your own audio I/O)
  • Build backend proxies for frontend applications

AsyncSamvaadAgent

Build real-time voice with a small set of inputs.
  • You provide InteractionConfig: who the user is, which app to talk to, interaction type, and audio sample rate; optionally include overrides like agent_variables and initial language/state.
  • You create AsyncSamvaadAgent with your API key, config, and optional audio interface plus callbacks for text/audio/events.
  • Start the agent: it fetches a signed WebSocket URL, sends interaction_start, and streams audio/text both ways.

Key features

  • Real-time voice interaction — natural speak and listen
  • Automatic audio management — built-in microphone input and speaker output
  • Async/await support — non-blocking operations
  • Callback handling — process text/audio/events asynchronously
  • Connection management — robust WebSocket handling
Minimal example:
import asyncio
from pydantic import SecretStr
from sarvam_conv_ai_sdk import AsyncSamvaadAgent, AsyncDefaultAudioInterface, InteractionConfig, InteractionType, ServerTextChunkMsg, SarvamToolLanguageName
from sarvam_conv_ai_sdk.messages.types import UserIdentifierType

async def handle_text(msg: ServerTextChunkMsg):
    print("Agent:", msg.text)

async def main(app_id: str, api_key: str):
    config = InteractionConfig(
        user_identifier_type=UserIdentifierType.CUSTOM,
        user_identifier="demo_user",
        org_id="org_ai",
        workspace_id="workspace_id",
        app_id=app_id,
        interaction_type=InteractionType.CALL,
        agent_variables={"agent_variable_1": "value"},
        initial_language_name=SarvamToolLanguageName.HINDI,
        sample_rate=16000,
    )

    agent = AsyncSamvaadAgent(
        api_key=SecretStr(api_key),
        config=config,
        audio_interface=AsyncDefaultAudioInterface(input_sample_rate=16000),
        text_callback=handle_text,
    )

    await agent.start()
    try:
        # Wait until the WebSocket disconnects or the agent is stopped
        await agent.wait_for_disconnect()
    finally:
        await agent.stop()

if __name__ == "__main__":
    asyncio.run(main(app_id="your_app_id", api_key="your_api_key"))

AsyncSamvaadAgent parameters

ParameterTypeRequiredDescription
api_keySecretStrYesAPI key used to fetch a signed WebSocket URL
configInteractionConfigYesInteraction start configuration (user id, app id, sample rate, overrides)
audio_interfaceAsyncAudioInterface or NoneNoAutomatic mic capture and speaker playback. Omit for headless usage (use send_audio)
text_callbackCallable[[ServerTextChunkMsg], Awaitable[None]] or NoneNoReceives streaming text chunks from the agent
audio_callbackCallable[[ServerAudioChunkMsg], Awaitable[None]] or NoneNoReceives audio chunks if not using audio_interface for playback
event_callbackCallable[[ServerEventBase], Awaitable[None]] or NoneNoReceives events like interaction_connected, user_interrupt, interaction_end
base_urlstrNoOverride base URL. Default: https://apps.sarvam.ai/api/app-runtime/
Methods:
  • await agent.start() — start and connect
  • await agent.stop() — stop and cleanup
  • await agent.wait_for_connect(timeout: float | None = 5.0) — wait until connected
  • await agent.wait_for_disconnect() — wait until disconnected or stopped
  • agent.is_connected() — connection status
  • await agent.send_audio(audio_bytes: bytes) — send raw 16‑bit PCM audio
  • agent.get_interaction_id() — current interaction id or None
Audio interface (optional): AsyncDefaultAudioInterface(input_sample_rate: int = 16000)
  • Methods: start(input_callback), output(audio: bytes, sample_rate?: int), interrupt(), stop()
  • Audio: LINEAR16 (16‑bit PCM mono). Supported sample rates: 8000, 16000

What you must provide: InteractionConfig

Required fields:
  • user_identifier_type: One of CUSTOM, EMAIL, PHONE_NUMBER, UNKNOWN
  • user_identifier: The identifier value (string; phone/email/custom id) # This id can be used to see logs in the log analyser
  • org_id: Your organization, e.g., “sarvamai”
  • workspace_id: Your workspace, e.g., “default”
  • app_id: The target application id
  • interaction_type: InteractionType.CALL (voice)
  • sample_rate: 8000 or 16000 (16-bit PCM mono)
  • version: int (Optional)
Important
If version is not provided, the SDK uses the latest committed version of the app.
The connection will fail if the provided app_id has no committed version.
Optional overrides (applied server-side at start):
  • agent_variables: dict of key/value to seed the agent context
  • initial_language_name: e.g., “English”, “Hindi” (must be allowed by app)
  • initial_state_name: starting state name, if your app uses states
  • initial_bot_message: first message from the agent
Example config:
from sarvam_conv_ai_sdk import InteractionConfig, InteractionType, SarvamToolLanguageName
from sarvam_conv_ai_sdk.messages.types import UserIdentifierType

config = InteractionConfig(
    user_identifier_type=UserIdentifierType.CUSTOM,
    user_identifier="demo_user_async",
    org_id="sarvamai",
    workspace_id="default",
    app_id="your_app_id",
    interaction_type=InteractionType.CALL,
    agent_variables={"user_language": "Hindi"},
    initial_language_name=SarvamToolLanguageName.HINDI,
    initial_state_name="greeting",
    sample_rate=16000,
)

Quick start: local voice test

  1. Install dependencies
brew install portaudio               # macOS
pip install "sarvam-conv-ai-sdk[all]"
  1. Set credentials (or pass directly in code)
export SARVAM_APP_ID="your_app_id"
export SARVAM_API_KEY="your_api_key"
  1. Run the example
python -m sarvam_conv_ai_sdk.examples.async_audio_example
The example uses AsyncDefaultAudioInterface to capture mic at 16kHz and play responses. You can override base_url in AsyncSamvaadAgent if you use a different environment.

Headless mode (no PyAudio)

Use your own audio I/O. Create the agent without audio_interface and push raw 16‑bit PCM mono chunks that match config.sample_rate.
agent = AsyncSamvaadAgent(api_key=SecretStr("your_api_key"), config=config, text_callback=handle_text)
await agent.start()

# Send raw audio bytes
await agent.send_audio(raw_pcm_bytes)  # LINEAR16 mono at 16kHz or 8kHz

await agent.stop()

Connect your frontend (backend proxy pattern)

See the section above for AsyncSamvaadAgent usage. For a full backend bridge, follow the same pattern in your server. Message shapes:
  • Frontend → backend (init):
{
  "type": "init",
  "app_id": "your_app_id",
  "context": {"language": "English", "user_name": "Priya"}
}
  • Frontend → backend (text):
{ "type": "text", "data": { "text": "Hello" } }
  • Frontend → backend (audio):
{ "type": "audio", "data": "<base64-raw-pcm>" }
Bridge essentials on the backend:
  • Build InteractionConfig from init context; create AsyncSamvaadAgent with callbacks.
  • Decode base64 and forward audio via await agent.send_audio(audio_bytes).
  • In text/audio/event callbacks, websocket.send_json back to the frontend.
Minimal sketch:
session.agent = AsyncSamvaadAgent(
    api_key=SecretStr(api_key),
    config=config,
    text_callback=session._handle_text,
    audio_callback=session._handle_audio,
    event_callback=session._handle_event,
)
await session.agent.start()

Requirements for Async Audio

  1. PyAudio installation:
    pip install sarvam-conv-ai-sdk[all]
    
  2. System dependencies:
    • macOS: brew install portaudio
    • Ubuntu/Debian: sudo apt-get install portaudio19-dev
    • Windows: download from http://www.portaudio.com/download.html
  3. Environment variables (optional convenience):
    export SARVAM_APP_ID="your_app_id"
    export SARVAM_API_KEY="your_api_key"
    

Complete Example

See sarvam_conv_ai_sdk/examples/async_audio_example.py for a full, runnable script with mic capture, callbacks, and clean shutdown.

Custom Tools

Example Usage

import httpx
from pydantic import Field

from sarvam_conv_ai_sdk import (
    SarvamInteractionTurnRole,
    SarvamOnEndTool,
    SarvamOnEndToolContext,
    SarvamOnStartTool,
    SarvamOnStartToolContext,
    SarvamTool,
    SarvamToolContext,
    SarvamToolLanguageName,
    SarvamToolOutput,
)

class OnStart(SarvamOnStartTool): #Name of the class has to be OnStart
    async def run(self, context: SarvamOnStartToolContext):
        user_id = context.get_user_identifier()
        async with httpx.AsyncClient() as client:
            response = await client.get(f"https://sarvam-flights.com/users/{user_id}")
            response.raise_for_status()
            user_data = response.json()

        source_destination = user_data.get("home_city")
        context.set_agent_variable("source_destination", source_destination)
        context.set_agent_variable("passenger_name", user_data.get("name"))
        
        # Store telephony call SID if available (for telephony channels)
        if context.provider_ref_id:
            context.set_agent_variable("call_sid", context.provider_ref_id)
        
        context.set_initial_language_name(SarvamToolLanguageName.ENGLISH)
        context.set_initial_bot_message(
            f"Hello! Would you like to book a flight from {source_destination}? Where would you like to go?",
        )
        return context


class BookFlight(SarvamTool):
    """Book a flight based on the user's travel preferences."""

    destination: str = Field(description="City of destination")
    travel_date: str = Field(description="Date of travel (YYYY-MM-DD)")

    async def run(self, context: SarvamToolContext) -> SarvamToolOutput:
        source_destination = context.get_agent_variable("source_destination")
        booking_data = {
            "source": source_destination,
            "destination": self.destination,
            "travel_date": self.travel_date,
            "passenger_name": context.get_agent_variable("passenger_name"),
        }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                "https://sarvam-flights.com/book", json=booking_data
            )
            response.raise_for_status()
            booking_result = response.json()

        if booking_result.get("status") == "confirmed":
            context.set_agent_variable("booking_id", booking_result.get("booking_id"))
            context.set_end_conversation()
            return SarvamToolOutput(
                message_to_user=f"Flight booked successfully to {self.destination}!",
                context=context,
            )
        else:
            context.change_state("recommend_destinations")
            return SarvamToolOutput(
                message_to_llm="Booking failed. Please suggest similar destinations.",
                context=context,
            )


class OnEnd(SarvamOnEndTool):  #Name of the class has to be OnEnd
    async def run(self, context: SarvamOnEndToolContext):
        feedback = context.get_agent_variable("feedback")
        negative_words = ["bad", "poor", "disappointed", "unhappy", "problem"]
        interaction_transcript = context.get_interaction_transcript()
        if interaction_transcript.interaction_transcript:
            for turn in interaction_transcript.interaction_transcript:
                if turn.role == SarvamInteractionTurnRole.USER:
                    is_negative = any(word in feedback.lower() for word in negative_words)
            context.set_agent_variable("feedback_sentiment", is_negative)
        
        # Log call details if telephony SID is available
        if context.provider_ref_id:
            async with httpx.AsyncClient() as client:
                await client.post(
                    "https://sarvam-flights.com/analytics/call-logs",
                    json={
                        "call_sid": context.provider_ref_id,
                        "user_id": context.get_user_identifier(),
                        "sentiment": is_negative,
                        "duration": (
                            interaction_transcript.interaction_end_time 
                            - interaction_transcript.interaction_start_time
                        ).total_seconds()
                    }
                )

        return context


Base Classes

The SDK exposes three base classes for tool development:

1. SarvamTool

Primary base class for all operational tools invoked during conversation flow. Example:
class MyCustomTool(SarvamTool):
    """Brief description of the tool's purpose."""

    tool_variable: type = Field(description="Description of this input parameter")

    async def run(self, context: SarvamToolContext) -> SarvamToolOutput:
        # Custom tool logic
        return SarvamToolOutput(
            message_to_user="Response to user",
            message_to_llm="Context for LLM",
            context=context
        )

2. SarvamOnStartTool

Executed at the beginning of a conversation, typically for initialization. The class must be named OnStart.

3. SarvamOnEndTool

Executed at the end of a conversation, typically for cleanup or post-processing. The class must be named OnEnd.

Context Classes and Methods

SarvamToolContext

The context object passed to SarvamTool.run() methods.

Variable Management

  • get_agent_variable(variable_name: str) -> Any Retrieve the value of a variable.
  • set_agent_variable(variable_name: str, value: Any) -> None Update a variable’s value.

Language Control

  • get_current_language() -> SarvamToolLanguageName Returns the current language of the agent.
  • change_language(language: SarvamToolLanguageName) -> None Update the language preference.

Conversation Flow

  • set_end_conversation() -> None Explicitly end the conversation.

State Management

  • get_current_state() -> str Returns the current state of the conversation.
  • change_state(state: str) -> None Transition to a new state. Note: The new state must be one of the next valid states defined in the agent configuration.

Engagement Metadata

  • get_engagement_metadata() -> EngagementMetadata Retrieve the engagement metadata containing information about the current interaction.

SarvamOnStartToolContext

The context object passed to SarvamOnStartTool.run() methods.

Variable Management

  • get_agent_variable(variable_name: str) -> Any Retrieve the value of a variable.
  • set_agent_variable(variable_name: str, value: Any) -> None Update a variable’s value.

User Information

  • get_user_identifier() -> str Get the user identifier.

Telephony Information

  • provider_ref_id: Optional[str] The reference ID from the channel provider. For telephony providers, this would contain the Call SID (Session ID) which uniquely identifies a specific phone call. For other channel providers, this would contain their respective reference IDs. Defaults to None for channels that don’t provide reference IDs.

Initialization Methods

  • set_initial_bot_message(message: str) -> None Set the first message sent by the agent when the conversation starts.
  • set_initial_state_name(state_name: str) -> None Set the initial state from which the agent should start.
  • set_initial_language_name(language: SarvamToolLanguageName) -> None Define the initial language preference for the user.

Engagement Metadata

  • get_engagement_metadata() -> EngagementMetadata Retrieve the engagement metadata containing information about the current interaction.

SarvamOnEndToolContext

The context object passed to SarvamOnEndTool.run() methods.

Variable Management

  • get_agent_variable(variable_name: str) -> Any Retrieve the value of a variable.
  • set_agent_variable(variable_name: str, value: Any) -> None Update a variable’s value.

User Information

  • get_user_identifier() -> str Get the user identifier.

Telephony Information

  • provider_ref_id: Optional[str] The reference ID from the channel provider. For telephony providers, this would contain the Call SID (Session ID) which uniquely identifies a specific phone call. For other channel providers, this would contain their respective reference IDs. Defaults to None for channels that don’t provide reference IDs.

Engagement Metadata

  • get_engagement_metadata() -> EngagementMetadata Retrieve the engagement metadata containing information about the current interaction.

Interaction Reattempt

  • set_retry_interaction The user will be reattempted with the same agent. Useful when any business goal has not been met.

Interaction Transcript

  • get_interaction_transcript() -> SarvamInteractionTranscript Retrieve the conversation history containing user and agent messages in English and the timestamp when the conversation began and ended. Format: yyyy-mm-dd hh:mm:ss
Example transcript:
[
    SarvamInteractionTurn(role=<SarvamInteractionTurnRole.AGENT: 'agent'>, en_text='Hello! How can I help you today?'),
    SarvamInteractionTurn(role=<SarvamInteractionTurnRole.USER: 'user'>, en_text='I need to book a flight'),
    SarvamInteractionTurn(role=<SarvamInteractionTurnRole.AGENT: 'agent'>, en_text='I can help you with that. Where would you like to go?'),
    SarvamInteractionTurn(role=<SarvamInteractionTurnRole.USER: 'user'>, en_text='I want to go to Mumbai'),
    SarvamInteractionTurn(role=<SarvamInteractionTurnRole.AGENT: 'agent'>, en_text='Great! When would you like to travel?')
]

Return Types

SarvamToolOutput

The return type for SarvamTool.run() methods. Contains:
  • message_to_user: Optional[str] - Message that is sent directly to the user
  • message_to_llm: Optional[str] - Message that is sent to the LLM, which then responds
  • context: SarvamToolContext - The updated context object
Note: At least one of message_to_llm or message_to_user must be set. Important: When both message_to_user and message_to_llm are set, only the message_to_user is actually sent to the user, but the message_to_llm overrides the message_to_user when adding to the chat thread for the LLM’s context.

EngagementMetadata

The engagement metadata object that can be retrieved from context objects using get_engagement_metadata(). Contains:
  • interaction_id: str - Unique identifier for each conversation between user & agent.
  • attempt_id: Optional[str] - Unique identifier for each attempt created on the platform
  • campaign_id: Optional[str] - Campaign ID for the interaction
  • interaction_language: SarvamToolLanguageName - The language used for the interaction (defaults to English)
  • app_id: str - Application identifier of the agent for the interaction
  • app_version: int - Version number of the agent
  • agent_phone_number: Optional[str] - Phone number associated with the conversational agent application

Supported Languages

The SDK supports multilingual conversations using the SarvamToolLanguageName enum. Available languages include:
  • Bengali
  • Gujarati
  • Kannada
  • Malayalam
  • Tamil
  • Telugu
  • Punjabi
  • Odia
  • Marathi
  • Hindi
  • English
Note: The allowed languages are actually a subset that is preselected while defining the agent configurations.

Best Practices

  1. Always implement run(): The run() method is the entry point for tool execution logic.
  2. Use Field() for parameters: Ensures type safety and adds descriptive metadata necessary for LLM to use in the prompt.
  3. Gracefully handle errors: Avoid accessing unset variables or using invalid types.
  4. Return the appropriate type: SarvamTool.run() must return SarvamToolOutput, while SarvamOnStartTool.run() and SarvamOnEndTool.run() return their respective context objects.
  5. Write meaningful docstrings: Clearly describe what each tool is intended to do as this directly impacts the performance of tool calling capabilities of the agent.
  6. Use async operations for I/O: For the best performance, use async/await for external API calls to avoid blocking.
  7. Use context methods: Use the provided context methods for variable management, language control, and messaging instead of directly accessing context attributes.

Testing Your Tools

After creating a tool, you can test it locally to ensure it works as expected. Here’s how to test your tools:

Testing Steps

  1. Create the ToolContext: Initialize the appropriate context object with test data
  2. Instantiate the tool class: Use tool.model_validate(tool_args) to create a tool instance
  3. Run the tool: Call the tool’s run() method with the context
  4. Observe the returned object: Check if the necessary changes have been made to the context

Example Test: SarvamTool

# Test the BookFlight tool
async def test_book_flight():
    # 1. Create the ToolContext
    context = SarvamToolContext(
        language=SarvamToolLanguageName.ENGLISH,
        allowed_languages=[SarvamToolLanguageName.ENGLISH],
        state="booking",
        next_valid_states=["recommend_destinations", "end"],
        agent_variables={
            "source_destination": "Mumbai",
            "passenger_name": "John Doe",
            "booking_id": "123"
        },
        engagement_metadata=EngagementMetadata(
            interaction_id="123",
            attempt_id="456",
            campaign_id="789",
            interaction_language=SarvamToolLanguageName.ENGLISH,
            app_id="101",
            app_version=1,
            agent_phone_number="+1234567890",
        ),
    )
    
    # 2. Instantiate the tool class
    tool_args = {
        "destination": "Delhi",
        "travel_date": "2024-03-15"
    }
    tool_instance = BookFlight.model_validate(tool_args)
    
    # 3. Run the tool
    result = await tool_instance.run(context)
    
    # 4. Observe the returned object
    print(f"Message to user: {result.message_to_user}")
    print(f"Message to LLM: {result.message_to_llm}")
    print(f"End conversation: {result.context.end_conversation}")
    print(f"Current state: {result.context.get_current_state()}")
    print(f"Agent variables: {result.context.agent_variables}")
    print(f"Current Language: {result.context.get_current_language()}")

# Run the test
asyncio.run(test_book_flight())

Example Test: OnStart Tool

For SarvamOnStartTool, the testing approach is similar but it returns the context object directly:
# Testing OnStart tool
async def test_on_start():
    context = SarvamOnStartToolContext(
        user_identifier="user123",
        agent_variables={"source_destination": "Mumbai", "passenger_name": "John Doe"},
        engagement_metadata=EngagementMetadata(
            interaction_id="123",
            attempt_id="456",
            campaign_id="789",
            interaction_language=SarvamToolLanguageName.ENGLISH,
            app_id="101",
            app_version=1,
            agent_phone_number="+1234567890",
        ),
        initial_bot_message=None,
        initial_state_name="start",
        initial_language_name=SarvamToolLanguageName.ENGLISH,
        provider_ref_id="CA1234567890abcdef1234567890abcdef",  # Optional: for telephony channels
    )
    
    tool_instance = OnStart()
    result = await tool_instance.run(context)
    
    print(f"Initial bot message: {result.initial_bot_message}")
    print(f"Initial state: {result.initial_state_name}")
    print(f"Initial Language Name: {result.initial_language_name}")
    print(f"Agent variables: {result.agent_variables}")
    print(f"Telephony Call SID: {result.provider_ref_id}")

# Run the test
asyncio.run(test_on_start())

Example Test: OnEnd Tool

# Testing OnEnd tool
async def test_on_end():
    context = SarvamOnEndToolContext(
        user_identifier="user123",
        agent_variables={"feedback": "I had a bad experience", "feedback_sentiment": False},
        engagement_metadata=EngagementMetadata(
            interaction_id="123",
            attempt_id="456",
            campaign_id="789",
            interaction_language=SarvamToolLanguageName.ENGLISH,
            app_id="101",
            app_version=1,
            agent_phone_number="+1234567890",
        ),
        interaction_transcript=SarvamInteractionTranscript(
            interaction_transcript=[
                SarvamInteractionTurn(role=SarvamInteractionTurnRole.AGENT, en_text='Hello! How can I help you today?'),
                SarvamInteractionTurn(role=SarvamInteractionTurnRole.USER, en_text='I need to book a flight'),
                SarvamInteractionTurn(role=SarvamInteractionTurnRole.AGENT, en_text='I can help you with that. Where would you like to go?'),
                SarvamInteractionTurn(role=SarvamInteractionTurnRole.USER, en_text='I want to go to Mumbai'),
                SarvamInteractionTurn(role=SarvamInteractionTurnRole.AGENT, en_text='Great! When would you like to travel?')
            ],
            interaction_start_time=datetime.now() - timedelta(minutes=2),
            interaction_end_time=datetime.now(),
        ),
        retry_interaction=False,
        provider_ref_id="CA1234567890abcdef1234567890abcdef",  # Optional: for telephony channels
    )
    
    tool_instance = OnEnd()
    result = await tool_instance.run(context)
    
    print(f"Agent variables: {result.agent_variables}")
    print(f"Interaction Retry: {result.retry_interaction}")
    print(f"Telephony Call SID: {result.provider_ref_id}")

# Run the test
asyncio.run(test_on_end())

Requirements for Async Audio

  1. PyAudio Installation:
    pip install sarvam-conv-ai-sdk[all]
    
  2. System Dependencies:
  3. Environment Variables:
    export SARVAM_APP_ID="your_app_id"
    export SARVAM_API_KEY="your_api_key"
    

Best Practices for Async Audio

  1. Use proper event loop setup for PyAudio compatibility:
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    
  2. Handle connection states gracefully:
    while agent.is_connected():
        await asyncio.sleep(1)
    
  3. Implement proper cleanup in finally blocks:
    finally:
        await agent.stop()
    
  4. Use appropriate sample rates (typically 16000 Hz for input)
  5. Handle interruptions with KeyboardInterrupt:
    except KeyboardInterrupt:
        print("Stopping conversation...")
    

Complete Example

See sarvam_conv_ai_sdk/examples/async_audio_example.py for a complete working script.