Pre-Trained vs Instruction-Tuned Model

Send one prompt to both models

This page loads two small language models with Transformers.js and runs the same prompt through each. They are a matched pair: SmolLM2-135M is a base model — trained only to predict the next token — and SmolLM2-135M-Instruct is the instruction fine-tuned version, trained further from that exact base model on instruction/response pairs. Same architecture, same size, same pretraining; the only difference is the fine-tuning. Both run on-device via WebAssembly.

Prompt

New tokens: 60

Temperature: 0.8

Preparing to load models…

Comparison

Base

SmolLM2-135M

onnx-community/SmolLM2-135M-ONNX — ~135M params

Base-model continuation will appear here.

Receives the raw prompt and simply continues the text. It was never taught that a prompt might be a request.

Instruction-tuned

SmolLM2-135M-Instruct

HuggingFaceTB/SmolLM2-135M-Instruct — ~135M params

Instruction-tuned response will appear here.

Receives the prompt wrapped in a chat template and replies as an assistant, because it was fine-tuned on instruction/response pairs.

About this page

Both models are small enough to download and run comfortably in a browser tab. Weights download once on first use and are then cached, so subsequent runs work offline. Crucially, these are not two unrelated models: SmolLM2-135M-Instruct was fine-tuned directly from SmolLM2-135M. They share the same architecture, tokenizer, size, and pretraining data, so any difference you see between the panels comes from instruction tuning alone — not from one model being newer or larger.

There is one deliberate difference in how the prompt is fed to each model. The base model gets the prompt as plain text: generator(prompt, …). The instruction-tuned model gets a list of chat messages: generator([{ role: 'user', content: prompt }], …), which Transformers.js wraps in the model's chat template before generation. This is not a trick to favour one side: a base model has never seen those chat tokens during training, so each model is being used the only way it actually can be. Output from models this small is often rough or repetitive — that is expected. To swap in other models, change the model IDs in the pipeline(…) calls.