Skip to content

createTokenChunker

function createTokenChunker(config?): Chunker;

Creates a token-aware chunker that splits text by token count.

Splits on sentence boundaries (. ! ? followed by whitespace) when possible to produce more natural chunk boundaries. Falls back to word boundaries, then character boundaries if sentences exceed the token limit.

Accepts a pluggable token counter function — use js-tiktoken for accurate GPT-family counting, gpt-tokenizer for a lighter alternative, or omit for a character-based (~4 chars/token) fallback.

Parameter Type

config

TokenChunkerConfig

Chunker