ONLINE --:--:--
2026-02-10 ★ FEATURED [PUBLISHED]

LLM-TOKENIZER-VIZ

Interactive browser-based visualizer for BPE tokenization across different LLM vocabularies.

ML/AIREACTINTERACTIVENLP
────────────────────────────────────────────────────────────

OVERVIEW

A zero-dependency browser tool for visualizing how different language models tokenize text. Paste any string and see token boundaries, IDs, and fertility scores side-by-side across GPT-4, Claude, and Llama tokenizers.

MOTIVATION

When debugging prompt engineering issues, token counts and boundaries matter enormously. Existing tools either require API calls or only support a single tokenizer. This runs entirely client-side via WASM-compiled Rust tokenizers.

TECHNICAL APPROACH

Input text


┌──────────────────────────────────┐
│  Tokenizer WASM modules (Rust)   │
│  • tiktoken (GPT-4 / cl100k)     │
│  • sentencepiece (Llama)         │
│  • huggingface tokenizers        │
└──────────────────────────────────┘


Color-coded token spans + stats

Each tokenizer is compiled to WebAssembly and loaded lazily — first load is ~400KB, subsequent tokenizers are cached in IndexedDB.

FEATURES

  • Side-by-side comparison of up to 4 tokenizers simultaneously
  • Token fertility ratio (tokens per word)
  • Special token highlighting
  • Copy token IDs as JSON array
  • Shareable URL state