# What Are Tokens — And Why Should You Care?

## ✦ Watch output length

If you need a detailed response, say so. If you need a brief one, say that too — it saves tokens and gets to the point faster.

As a rough rule of thumb: 100 tokens is about 75 words, or a short paragraph. A typical novel runs around 100,000 words — that's roughly 133,000 tokens. Claude's extended context window can hold the equivalent of several books at once.

## The context window : Claude's working memory

Every conversation with Claude happens inside a *context window* — a fixed-size buffer that holds everything Claude can "see" at once. This includes your entire conversation history, any documents you paste in, system instructions, and Claude's own responses.

Once the window fills up, older content scrolls out of view. Claude doesn't forget it in the human sense — it simply can't read past what fits. This is why very long conversations can occasionally feel like Claude loses track of something said much earlier.

> Tokens flow in both directions. The messages you send are *input tokens*; the words Claude writes back are *output tokens*. Both count toward usage — which is why a long document you paste in, plus a detailed response, can add up quickly.

## Why does this matter in practice?

For most everyday use — asking questions, drafting emails, writing code — you'll never hit a limit. But for power users working with large documents, long research threads, or complex multi-step tasks, token awareness becomes a practical skill.

<table style="min-width: 75px;"><colgroup><col style="min-width: 25px;"><col style="min-width: 25px;"><col style="min-width: 25px;"></colgroup><tbody><tr><td colspan="1" rowspan="1"><p>✦ Be concise in prompts</p></td><td colspan="1" rowspan="1"><p>Shorter, well-targeted prompts leave more room for Claude's response and keep costs lower on API usage.</p></td><th colspan="1" rowspan="1"><p></p></th></tr><tr><td colspan="1" rowspan="1"><p>✦ Paste selectively</p></td><td colspan="1" rowspan="1"><p>When working with documents, paste only the relevant sections rather than the entire file.</p></td><td colspan="1" rowspan="1"><p></p></td></tr><tr><td colspan="1" rowspan="1"><p>✦ Summarise long threads</p></td><td colspan="1" rowspan="1"><p>If a conversation is getting long, ask Claude to summarise key points — then start fresh with that summary as context.</p></td><td colspan="1" rowspan="1"><p></p></td></tr><tr><td colspan="1" rowspan="1"><p>✦ Watch output length</p></td><td colspan="1" rowspan="1"><p>If you need a detailed response, say so. If you need a brief one, say that too — it saves tokens and gets to the point faster.</p></td><td colspan="1" rowspan="1"><p></p></td></tr></tbody></table>

## Tokens and pricing on the API

If you're building with Claude via the API, tokens are the billing unit. You pay for what you consume — input and output separately. This makes token efficiency a real engineering consideration: a well-crafted system prompt that's 200 tokens instead of 800 can meaningfully reduce costs at scale.

Different Claude models have different context limits and pricing tiers. Claude Sonnet 4.6, for instance, balances capability and efficiency for everyday tasks, while Opus 4.6 offers more reasoning depth for complex work — at a higher token cost.

## The bigger picture

Tokens aren't just an implementation detail — they're a window into how language models perceive text. Claude doesn't read your words the way you do. It processes sequences of these sub-word units, learning statistical patterns across billions of them during training. The quality of that training, across an enormous token vocabulary, is what gives Claude its fluency.

Next time you're mid-conversation and something feels slightly off — a long document Claude seems to have "forgotten," or a response that got truncated — there's a good chance tokens are the explanation. And now you'll know exactly why.
