Voice Agents
The same defineAgent you use for text works for voice. Kuralle keeps tool, flow, and handoff authority regardless of transport — audio handling lives in a separate layer.
Two paths
Section titled “Two paths”| Path | Package | How it works |
|---|---|---|
| Provider-native realtime | @kuralle-agents/realtime-audio | Raw audio goes directly to the provider model (Gemini Live, OpenAI Realtime, xAI Grok Realtime) and audio comes back in a single connection. Lower latency; no transcript round-trip. |
| Cascaded STT→LLM→TTS | @kuralle-agents/livekit-plugin | Speech-to-text transcribes audio, a standard Kuralle turn runs, then text-to-speech synthesizes the response. Works with any LLM. |
Both paths run the same Kuralle runtime — flows, tools, routing, and handoffs work identically.
Provider-native realtime
Section titled “Provider-native realtime”@kuralle-agents/realtime-audio sends raw audio directly to the model and receives audio back in a single connection.
npm install @kuralle-agents/realtime-audioVoiceEngine is the call acceptor. It creates per-call VoiceCallSession workers that bridge a transport to the chosen provider:
import { VoiceEngine, createGeminiClientFactory } from '@kuralle-agents/realtime-audio';import { defineTool, buildToolSet } from '@kuralle-agents/core';import { z } from 'zod';
const lookupOrder = defineTool({ name: 'lookup_order', description: 'Look up an order by ID', input: z.object({ orderId: z.string() }), execute: async ({ orderId }) => ({ status: 'shipped', orderId }),});
const engine = new VoiceEngine({ agents: [ { id: 'support', name: 'Voice Support Agent', instructions: 'You are a helpful voice support agent.', voice: 'Charon', tools: buildToolSet({ lookup_order: lookupOrder }), }, ], defaultAgentId: 'support', modelClientFactory: createGeminiClientFactory({ apiKey: process.env.GOOGLE_API_KEY!, model: 'gemini-2.5-flash-preview-native-audio', }),});
// When a WebSocket or LiveKit connection arrives, accept it as a call://// const session = await engine.acceptCall({// callId: crypto.randomUUID(),// transport: yourTransportSession, // implements TransportSession// });// await session.start();When a connection arrives from your transport layer, call engine.acceptCall({ callId, transport }) and then session.start().
Cascaded voice (LiveKit)
Section titled “Cascaded voice (LiveKit)”@kuralle-agents/livekit-plugin bridges Kuralle to a LiveKit voice pipeline via the cascaded path: STT → KuralleRuntimeLLMAdapter → TTS.
npm install @kuralle-agents/livekit-plugin @kuralle-agents/livekit-plugin-transport-wsimport { KuralleVoiceSession } from '@kuralle-agents/livekit-plugin';import { GeminiLiveSTT, GeminiLiveTTS } from '@kuralle-agents/livekit-plugin/gemini';import { WebSocketAgentServer } from '@kuralle-agents/livekit-plugin-transport-ws';import { createRuntime, defineAgent } from '@kuralle-agents/core';import { openai } from '@ai-sdk/openai';
const agent = defineAgent({ id: 'support', instructions: 'You are a helpful support agent.', model: openai('gpt-4o-mini'),});
const runtime = createRuntime({ agents: [agent], defaultAgentId: 'support',});
const server = new WebSocketAgentServer({ port: 8080 });
server.onConnection(async (transport) => { const session = new KuralleVoiceSession({ runtime, stt: new GeminiLiveSTT(), tts: new GeminiLiveTTS(), greeting: 'Hello, how can I help?', }); await server.startSession(transport, session);});
await server.listen();KuralleRuntimeLLMAdapter wraps any Kuralle Runtime as the LLM step, so your agents, flows, and tools run exactly as they do in text mode.
Which path to choose
Section titled “Which path to choose”- Use provider-native realtime when latency is the priority or when you want the provider’s native voice model capabilities (Gemini Live, OpenAI Realtime).
- Use cascaded when you need to mix voice with a non-realtime LLM, need fine-grained STT/TTS control, or are already on a LiveKit-based infrastructure.