Overview
The Sarvam Conv AI SDK enables developers to create tools that can:- Facilitate agentic capabilities like API calling in the middle of a conversation.
- Manage agent-specific variables
- Control and modify the language used during conversations
- Send dynamic messages to both the user and the underlying language model (LLM)
Installation
Basic Installation
Install the SDK via pip:Audio Support (Optional)
If you want to use audio streaming features (microphone input and speaker output), you need to install PyAudio. This requires system-level dependencies:Option 1: Install with audio support
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev - Windows: Download from http://www.portaudio.com/download.html
Option 2: Use without PyAudio
The SDK works without PyAudio for non-playback environments; audio capture/playback features will not be available. You can still:- Use the WebSocket client for real-time voice conversations (provide your own audio I/O)
- Build backend proxies for frontend applications
AsyncSamvaadAgent
Build real-time voice with a small set of inputs.- You provide InteractionConfig: who the user is, which app to talk to, interaction type, and audio sample rate; optionally include overrides like agent_variables and initial language/state.
- You create AsyncSamvaadAgent with your API key, config, and optional audio interface plus callbacks for text/audio/events.
- Start the agent: it fetches a signed WebSocket URL, sends interaction_start, and streams audio/text both ways.
Key features
- Real-time voice interaction — natural speak and listen
- Automatic audio management — built-in microphone input and speaker output
- Async/await support — non-blocking operations
- Callback handling — process text/audio/events asynchronously
- Connection management — robust WebSocket handling
AsyncSamvaadAgent parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| api_key | SecretStr | Yes | API key used to fetch a signed WebSocket URL |
| config | InteractionConfig | Yes | Interaction start configuration (user id, app id, sample rate, overrides) |
| audio_interface | AsyncAudioInterface or None | No | Automatic mic capture and speaker playback. Omit for headless usage (use send_audio) |
| text_callback | Callable[[ServerTextChunkMsg], Awaitable[None]] or None | No | Receives streaming text chunks from the agent |
| audio_callback | Callable[[ServerAudioChunkMsg], Awaitable[None]] or None | No | Receives audio chunks if not using audio_interface for playback |
| event_callback | Callable[[ServerEventBase], Awaitable[None]] or None | No | Receives events like interaction_connected, user_interrupt, interaction_end |
| base_url | str | No | Override base URL. Default: https://apps.sarvam.ai/api/app-runtime/ |
await agent.start()— start and connectawait agent.stop()— stop and cleanupawait agent.wait_for_connect(timeout: float | None = 5.0)— wait until connectedawait agent.wait_for_disconnect()— wait until disconnected or stoppedagent.is_connected()— connection statusawait agent.send_audio(audio_bytes: bytes)— send raw 16‑bit PCM audioagent.get_interaction_id()— current interaction id orNone
AsyncDefaultAudioInterface(input_sample_rate: int = 16000)
- Methods:
start(input_callback),output(audio: bytes, sample_rate?: int),interrupt(),stop() - Audio: LINEAR16 (16‑bit PCM mono). Supported sample rates: 8000, 16000
What you must provide: InteractionConfig
Required fields:- user_identifier_type: One of CUSTOM, EMAIL, PHONE_NUMBER, UNKNOWN
- user_identifier: The identifier value (string; phone/email/custom id) # This id can be used to see logs in the log analyser
- org_id: Your organization, e.g., “sarvamai”
- workspace_id: Your workspace, e.g., “default”
- app_id: The target application id
- interaction_type: InteractionType.CALL (voice)
- sample_rate: 8000 or 16000 (16-bit PCM mono)
- version: int (Optional)
ImportantOptional overrides (applied server-side at start):
Ifversionis not provided, the SDK uses the latest committed version of the app.
The connection will fail if the providedapp_idhas no committed version.
- agent_variables: dict of key/value to seed the agent context
- initial_language_name: e.g., “English”, “Hindi” (must be allowed by app)
- initial_state_name: starting state name, if your app uses states
- initial_bot_message: first message from the agent
Quick start: local voice test
- Install dependencies
- Set credentials (or pass directly in code)
- Run the example
Headless mode (no PyAudio)
Use your own audio I/O. Create the agent without audio_interface and push raw 16‑bit PCM mono chunks that match config.sample_rate.Connect your frontend (backend proxy pattern)
See the section above for AsyncSamvaadAgent usage. For a full backend bridge, follow the same pattern in your server. Message shapes:- Frontend → backend (init):
- Frontend → backend (text):
- Frontend → backend (audio):
- Build InteractionConfig from init context; create AsyncSamvaadAgent with callbacks.
- Decode base64 and forward audio via await agent.send_audio(audio_bytes).
- In text/audio/event callbacks, websocket.send_json back to the frontend.
Requirements for Async Audio
-
PyAudio installation:
-
System dependencies:
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev - Windows: download from
http://www.portaudio.com/download.html
- macOS:
-
Environment variables (optional convenience):
Complete Example
Seesarvam_conv_ai_sdk/examples/async_audio_example.py for a full, runnable script with mic capture, callbacks, and clean shutdown.
Custom Tools
Example Usage
Base Classes
The SDK exposes three base classes for tool development:1. SarvamTool
Primary base class for all operational tools invoked during conversation flow.
Example:
2. SarvamOnStartTool
Executed at the beginning of a conversation, typically for initialization. The class must be named OnStart.
3. SarvamOnEndTool
Executed at the end of a conversation, typically for cleanup or post-processing. The class must be named OnEnd.
Context Classes and Methods
SarvamToolContext
The context object passed to SarvamTool.run() methods.
Variable Management
-
get_agent_variable(variable_name: str) -> AnyRetrieve the value of a variable. -
set_agent_variable(variable_name: str, value: Any) -> NoneUpdate a variable’s value.
Language Control
-
get_current_language() -> SarvamToolLanguageNameReturns the current language of the agent. -
change_language(language: SarvamToolLanguageName) -> NoneUpdate the language preference.
Conversation Flow
set_end_conversation() -> NoneExplicitly end the conversation.
State Management
-
get_current_state() -> strReturns the current state of the conversation. -
change_state(state: str) -> NoneTransition to a new state. Note: The new state must be one of the next valid states defined in the agent configuration.
Engagement Metadata
get_engagement_metadata() -> EngagementMetadataRetrieve the engagement metadata containing information about the current interaction.
SarvamOnStartToolContext
The context object passed to SarvamOnStartTool.run() methods.
Variable Management
-
get_agent_variable(variable_name: str) -> AnyRetrieve the value of a variable. -
set_agent_variable(variable_name: str, value: Any) -> NoneUpdate a variable’s value.
User Information
get_user_identifier() -> strGet the user identifier.
Telephony Information
provider_ref_id: Optional[str]The reference ID from the channel provider. For telephony providers, this would contain the Call SID (Session ID) which uniquely identifies a specific phone call. For other channel providers, this would contain their respective reference IDs. Defaults toNonefor channels that don’t provide reference IDs.
Initialization Methods
-
set_initial_bot_message(message: str) -> NoneSet the first message sent by the agent when the conversation starts. -
set_initial_state_name(state_name: str) -> NoneSet the initial state from which the agent should start. -
set_initial_language_name(language: SarvamToolLanguageName) -> NoneDefine the initial language preference for the user.
Engagement Metadata
get_engagement_metadata() -> EngagementMetadataRetrieve the engagement metadata containing information about the current interaction.
SarvamOnEndToolContext
The context object passed to SarvamOnEndTool.run() methods.
Variable Management
-
get_agent_variable(variable_name: str) -> AnyRetrieve the value of a variable. -
set_agent_variable(variable_name: str, value: Any) -> NoneUpdate a variable’s value.
User Information
get_user_identifier() -> strGet the user identifier.
Telephony Information
provider_ref_id: Optional[str]The reference ID from the channel provider. For telephony providers, this would contain the Call SID (Session ID) which uniquely identifies a specific phone call. For other channel providers, this would contain their respective reference IDs. Defaults toNonefor channels that don’t provide reference IDs.
Engagement Metadata
get_engagement_metadata() -> EngagementMetadataRetrieve the engagement metadata containing information about the current interaction.
Interaction Reattempt
set_retry_interactionThe user will be reattempted with the same agent. Useful when any business goal has not been met.
Interaction Transcript
get_interaction_transcript() -> SarvamInteractionTranscriptRetrieve the conversation history containing user and agent messages in English and the timestamp when the conversation began and ended. Format:yyyy-mm-dd hh:mm:ss
Return Types
SarvamToolOutput
The return type for SarvamTool.run() methods. Contains:
message_to_user: Optional[str]- Message that is sent directly to the usermessage_to_llm: Optional[str]- Message that is sent to the LLM, which then respondscontext: SarvamToolContext- The updated context object
message_to_llm or message_to_user must be set.
Important: When both message_to_user and message_to_llm are set, only the message_to_user is actually sent to the user, but the message_to_llm overrides the message_to_user when adding to the chat thread for the LLM’s context.
EngagementMetadata
The engagement metadata object that can be retrieved from context objects using get_engagement_metadata(). Contains:
interaction_id: str- Unique identifier for each conversation between user & agent.attempt_id: Optional[str]- Unique identifier for each attempt created on the platformcampaign_id: Optional[str]- Campaign ID for the interactioninteraction_language: SarvamToolLanguageName- The language used for the interaction (defaults to English)app_id: str- Application identifier of the agent for the interactionapp_version: int- Version number of the agentagent_phone_number: Optional[str]- Phone number associated with the conversational agent application
Supported Languages
The SDK supports multilingual conversations using theSarvamToolLanguageName enum. Available languages include:
- Bengali
- Gujarati
- Kannada
- Malayalam
- Tamil
- Telugu
- Punjabi
- Odia
- Marathi
- Hindi
- English
Best Practices
- Always implement
run(): Therun()method is the entry point for tool execution logic. - Use
Field()for parameters: Ensures type safety and adds descriptive metadata necessary for LLM to use in the prompt. - Gracefully handle errors: Avoid accessing unset variables or using invalid types.
- Return the appropriate type:
SarvamTool.run()must returnSarvamToolOutput, whileSarvamOnStartTool.run()andSarvamOnEndTool.run()return their respective context objects. - Write meaningful docstrings: Clearly describe what each tool is intended to do as this directly impacts the performance of tool calling capabilities of the agent.
- Use async operations for I/O: For the best performance, use
async/awaitfor external API calls to avoid blocking. - Use context methods: Use the provided context methods for variable management, language control, and messaging instead of directly accessing context attributes.
Testing Your Tools
After creating a tool, you can test it locally to ensure it works as expected. Here’s how to test your tools:Testing Steps
- Create the ToolContext: Initialize the appropriate context object with test data
- Instantiate the tool class: Use
tool.model_validate(tool_args)to create a tool instance - Run the tool: Call the tool’s
run()method with the context - Observe the returned object: Check if the necessary changes have been made to the context
Example Test: SarvamTool
Example Test: OnStart Tool
ForSarvamOnStartTool, the testing approach is similar but it returns the context object directly:
Example Test: OnEnd Tool
Requirements for Async Audio
-
PyAudio Installation:
-
System Dependencies:
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev - Windows: Download from http://www.portaudio.com/download.html
- macOS:
-
Environment Variables:
Best Practices for Async Audio
-
Use proper event loop setup for PyAudio compatibility:
-
Handle connection states gracefully:
-
Implement proper cleanup in finally blocks:
- Use appropriate sample rates (typically 16000 Hz for input)
-
Handle interruptions with KeyboardInterrupt:
Complete Example
Seesarvam_conv_ai_sdk/examples/async_audio_example.py for a complete working script.