Voice Agents

The same defineAgent you use for text works for voice. Kuralle keeps tool, flow, and handoff authority regardless of transport — audio handling lives in a separate layer.

Two paths

| Path | Package | How it works | |---|---|---| | Provider-native realtime | @kuralle-agents/realtime-audio | Raw audio goes directly to the provider model (Gemini Live, OpenAI Realtime; xAI Grok Realtime via the Cloudflare Workers client) and audio comes back in a single connection. Lower latency; no transcript round-trip. | | Cascaded STT→LLM→TTS | @kuralle-agents/livekit-plugin | Speech-to-text transcribes audio, a standard Kuralle turn runs, then text-to-speech synthesizes the response. Works with any LLM. |

Both paths run the same Kuralle runtime — flows, tools, routing, and handoffs work identically.

Provider-native realtime

@kuralle-agents/realtime-audio sends raw audio directly to the model and receives audio back in a single connection.

npm install @kuralle-agents/realtime-audio

VoiceEngine is the call acceptor. It creates a per-call worker (RealtimeCallWorker) that bridges a transport to the chosen provider:

import { VoiceEngine, createGeminiClientFactory } from '@kuralle-agents/realtime-audio';
import { defineTool, buildToolSet } from '@kuralle-agents/core';
import { z } from 'zod';

const lookupOrder = defineTool({
  name: 'lookup_order',
  description: 'Look up an order by ID',
  input: z.object({ orderId: z.string() }),
  execute: async ({ orderId }) => ({ status: 'shipped', orderId }),
});

const engine = new VoiceEngine({
  agents: [
    {
      id: 'support',
      name: 'Voice Support Agent',
      instructions: 'You are a helpful voice support agent.',
      voice: 'Charon',
      tools: buildToolSet({ lookup_order: lookupOrder }),
    },
  ],
  defaultAgentId: 'support',
  modelClientFactory: createGeminiClientFactory({
    apiKey: process.env.GOOGLE_API_KEY!,
    model: 'gemini-2.5-flash-preview-native-audio',
  }),
});

// When a WebSocket or LiveKit connection arrives, accept it as a call:
//
//   const session = await engine.acceptCall({
//     callId: crypto.randomUUID(),
//     transport: yourTransportSession,  // implements TransportSession
//   });
//   await session.start();

When a connection arrives from your transport layer, call engine.acceptCall({ callId, transport }) and then session.start().

Cascaded voice (LiveKit)

@kuralle-agents/livekit-plugin bridges Kuralle to a LiveKit voice pipeline via the cascaded path: STT → KuralleRuntimeLLMAdapter → TTS.

npm install @kuralle-agents/livekit-plugin @kuralle-agents/livekit-plugin-transport-ws

import { KuralleVoiceSession } from '@kuralle-agents/livekit-plugin';
import { GeminiLiveSTT, GeminiLiveTTS } from '@kuralle-agents/livekit-plugin/gemini';
import { WebSocketAgentServer } from '@kuralle-agents/livekit-plugin-transport-ws';
import { createRuntime, defineAgent } from '@kuralle-agents/core';
import { openai } from '@ai-sdk/openai';

const agent = defineAgent({
  id: 'support',
  instructions: 'You are a helpful support agent.',
  model: openai('gpt-4o-mini'),
});

const runtime = createRuntime({
  agents: [agent],
  defaultAgentId: 'support',
});

const server = new WebSocketAgentServer({ port: 8080 });

server.onConnection(async (transport) => {
  const session = new KuralleVoiceSession({
    runtime,
    stt: new GeminiLiveSTT(),
    tts: new GeminiLiveTTS(),
    greeting: 'Hello, how can I help?',
  });
  await server.startSession(transport, session);
});

await server.listen();

KuralleRuntimeLLMAdapter wraps any Kuralle Runtime as the LLM step, so your agents, flows, and tools run exactly as they do in text mode.

Which path to choose

Use provider-native realtime when latency is the priority or when you want the provider’s native voice model capabilities (Gemini Live, OpenAI Realtime).
Use cascaded when you need to mix voice with a non-realtime LLM, need fine-grained STT/TTS control, or are already on a LiveKit-based infrastructure.