OVERVIEW
A zero-dependency browser tool for visualizing how different language models tokenize text. Paste any string and see token boundaries, IDs, and fertility scores side-by-side across GPT-4, Claude, and Llama tokenizers.
MOTIVATION
When debugging prompt engineering issues, token counts and boundaries matter enormously. Existing tools either require API calls or only support a single tokenizer. This runs entirely client-side via WASM-compiled Rust tokenizers.
TECHNICAL APPROACH
Input text
│
▼
┌──────────────────────────────────┐
│ Tokenizer WASM modules (Rust) │
│ • tiktoken (GPT-4 / cl100k) │
│ • sentencepiece (Llama) │
│ • huggingface tokenizers │
└──────────────────────────────────┘
│
▼
Color-coded token spans + stats
Each tokenizer is compiled to WebAssembly and loaded lazily — first load is ~400KB, subsequent tokenizers are cached in IndexedDB.
FEATURES
- Side-by-side comparison of up to 4 tokenizers simultaneously
- Token fertility ratio (tokens per word)
- Special token highlighting
- Copy token IDs as JSON array
- Shareable URL state