Skip to content

Voice Agents

The same defineAgent you use for text works for voice. Kuralle keeps tool, flow, and handoff authority regardless of transport — audio handling lives in a separate layer.

| Path | Package | How it works | |---|---|---| | Provider-native realtime | @kuralle-agents/realtime-audio | Raw audio goes directly to the provider model (Gemini Live, OpenAI Realtime, xAI Grok Realtime) and audio comes back in a single connection. Lower latency; no transcript round-trip. | | Cascaded STT→LLM→TTS | @kuralle-agents/livekit-plugin | Speech-to-text transcribes audio, a standard Kuralle turn runs, then text-to-speech synthesizes the response. Works with any LLM. |

Both paths run the same Kuralle runtime — flows, tools, routing, and handoffs work identically.

@kuralle-agents/realtime-audio sends raw audio directly to the model and receives audio back in a single connection.

Terminal window
npm install @kuralle-agents/realtime-audio

VoiceEngine is the call acceptor. It creates per-call VoiceCallSession workers that bridge a transport to the chosen provider:

realtime-audio.ts
import { VoiceEngine, createGeminiClientFactory } from '@kuralle-agents/realtime-audio';
import { defineTool, buildToolSet } from '@kuralle-agents/core';
import { z } from 'zod';
const lookupOrder = defineTool({
name: 'lookup_order',
description: 'Look up an order by ID',
input: z.object({ orderId: z.string() }),
execute: async ({ orderId }) => ({ status: 'shipped', orderId }),
});
const engine = new VoiceEngine({
agents: [
{
id: 'support',
name: 'Voice Support Agent',
instructions: 'You are a helpful voice support agent.',
voice: 'Charon',
tools: buildToolSet({ lookup_order: lookupOrder }),
},
],
defaultAgentId: 'support',
modelClientFactory: createGeminiClientFactory({
apiKey: process.env.GOOGLE_API_KEY!,
model: 'gemini-2.5-flash-preview-native-audio',
}),
});
// When a WebSocket or LiveKit connection arrives, accept it as a call:
//
// const session = await engine.acceptCall({
// callId: crypto.randomUUID(),
// transport: yourTransportSession, // implements TransportSession
// });
// await session.start();

When a connection arrives from your transport layer, call engine.acceptCall({ callId, transport }) and then session.start().

@kuralle-agents/livekit-plugin bridges Kuralle to a LiveKit voice pipeline via the cascaded path: STT → KuralleRuntimeLLMAdapter → TTS.

Terminal window
npm install @kuralle-agents/livekit-plugin @kuralle-agents/livekit-plugin-transport-ws
cascaded-voice.ts
import { KuralleVoiceSession } from '@kuralle-agents/livekit-plugin';
import { GeminiLiveSTT, GeminiLiveTTS } from '@kuralle-agents/livekit-plugin/gemini';
import { WebSocketAgentServer } from '@kuralle-agents/livekit-plugin-transport-ws';
import { createRuntime, defineAgent } from '@kuralle-agents/core';
import { openai } from '@ai-sdk/openai';
const agent = defineAgent({
id: 'support',
instructions: 'You are a helpful support agent.',
model: openai('gpt-4o-mini'),
});
const runtime = createRuntime({
agents: [agent],
defaultAgentId: 'support',
});
const server = new WebSocketAgentServer({ port: 8080 });
server.onConnection(async (transport) => {
const session = new KuralleVoiceSession({
runtime,
stt: new GeminiLiveSTT(),
tts: new GeminiLiveTTS(),
greeting: 'Hello, how can I help?',
});
await server.startSession(transport, session);
});
await server.listen();

KuralleRuntimeLLMAdapter wraps any Kuralle Runtime as the LLM step, so your agents, flows, and tools run exactly as they do in text mode.

  • Use provider-native realtime when latency is the priority or when you want the provider’s native voice model capabilities (Gemini Live, OpenAI Realtime).
  • Use cascaded when you need to mix voice with a non-realtime LLM, need fine-grained STT/TTS control, or are already on a LiveKit-based infrastructure.