llm
A small language model running entirely in your browser.
No server, no API key. Your prompt never leaves your machine.
SmolLM2-1.7B, single-turn chat.
Mobile mode: running a smaller model on CPU via WebAssembly. Expect slower responses. Try the desktop version for a much better experience with WebGPU acceleration.
SmolLM2-1.7B-Instruct, single-turn chat via WebGPU with WebLLM.
SmolLM2-360M-Instruct Q4_K_M (~271MB) via llama.cpp compiled to WebAssembly with wllama.