Skip to content
DocsUse AIfundamentalsAI Fundamentals
Chapter 2 of 15·fundamentals·8 min read

AI Fundamentals

Kiến Thức AI Cơ Bản

Understanding tokens, context windows, and how LLMs process your code

Hover or tap any paragraph to see Vietnamese translation

What Are Tokens?

Tokens are the smallest units that large language models (LLMs) process. Instead of reading character by character or word by word, LLMs split text into tokens - which can be words, parts of words, or even individual characters depending on the language and context.

Understanding tokens is crucial because they directly affect API costs, context limits, and how you structure your prompts.

Token Count Examples
"Hello world"2 tokens
function add(a,b){return a+b}12 tokens
~100 lines of code800 tokens

How Tokenization Works

When you send text to an LLM, it gets split into tokens first. For example, the word "understanding" might be split into "under" + "stand" + "ing". Code typically generates more tokens because of symbols and syntax.

Tokenization Example
1// This simple function...2function greet(name) {3  return "Hello, " + name;4}56// ...becomes approximately 15-20 tokens:7// "function" "greet" "(" "name" ")" "{" "return"8// """ "Hello" "," """ "+" "name" ";" "}"
Info
Rule of thumb: 1 token is roughly 4 English characters, or about 3/4 of a word. Code typically has higher token density due to punctuation and syntax.

Context Window Explained

The context window is the LLM's "working memory" - the total number of tokens it can process in a single interaction. This includes everything: the system prompt, conversation history, file contents you share, and its response.

Context Window Composition
Tool Results / Available Space
Grep, Bash output + room for response
File Contents
Code Claude has read
Conversation History
Previous messages exchanged
CLAUDE.md
Project-specific guidelines
System Prompt
Fixed instructions for the AI

Context Limits and Implications

Different models have different context limits. Claude has a context window of up to 200K tokens, allowing it to process large codebases. However, efficient context management is still crucial.

  • Full context: When the context window fills up, older information gets truncated or summarized
  • Cost: More tokens means higher API costs
  • Performance: Very large contexts can slow down response times
  • Accuracy: Information in the middle of long contexts may get "lost in the middle"

How LLMs Generate Code

LLMs don't "understand" code the way humans do. Instead, they predict the most likely next token based on patterns learned from millions of lines of code in their training data.

LLM Code Generation Pipeline
1
Input
Your prompt
2
Tokenize
Split into tokens
3
Process
Neural network
4
Generate
Predict next token
5
Output
Complete response

Temperature and Sampling

When an LLM generates text, it calculates probabilities for each possible next token. "Temperature" controls how random the selection is:

  • Low temperature (0.0-0.3): More deterministic and focused output, good for code
  • High temperature (0.7-1.0): More varied and creative output, good for brainstorming

Why Responses Vary

Even with the same prompt, an LLM may generate slightly different responses due to random sampling. This is why:

  • The same question might yield different code
  • Rerunning a prompt might improve results
  • More specific prompts lead to more consistent outputs

RAG Basics

RAG (Retrieval-Augmented Generation) is a technique that helps LLMs access information beyond their trained knowledge. Instead of relying only on what it learned, the LLM can search for and use relevant information from external sources.

Query
Search Codebase
Retrieve Context
Generate Response

Why Claude Code Uses RAG

Claude Code uses RAG to work effectively with large codebases. Instead of trying to fit the entire codebase into the context window, it:

  • Searches for relevant files when needed (using Grep, Glob)
  • Only reads the portions of code necessary for the task
  • Combines retrieved information with trained knowledge
  • Generates accurate, up-to-date responses about your project
Info
This is why Claude Code can work with projects containing millions of lines of code - it doesn't need to "remember" everything at once.

Practical Implications

Token Efficiency Tips

Understanding how tokens work helps you write more efficient prompts:

  • Be concise: Remove filler words from your prompts
  • Be specific: Point to exact files or functions instead of asking to search
  • Break up tasks: Large tasks can be split into smaller conversations
  • Use CLAUDE.md: Put repeated instructions in this file instead of repeating each time

Context Management Preview

In the next chapter, you'll learn how to manage the context window effectively, including:

  • How Claude Code automatically compresses context
  • Techniques to keep context focused and relevant
  • When to start a new conversation

Key Takeaways

Điểm Chính

  • Tokens are the fundamental units LLMs process - typically 1 token equals 4 charactersToken là đơn vị cơ bản LLM xử lý - thường 1 token bằng 4 ký tự
  • Context window is limited working memory - everything must fit including your prompt and the responseContext window là bộ nhớ làm việc có giới hạn - mọi thứ phải vừa bao gồm prompt và câu trả lời
  • LLMs generate code by predicting the most likely next token based on learned patternsLLM tạo code bằng cách dự đoán token tiếp theo có khả năng nhất dựa trên pattern đã học
  • RAG allows Claude Code to work with large codebases by retrieving relevant context on demandRAG cho phép Claude Code làm việc với codebase lớn bằng cách truy xuất context liên quan khi cần

Practice

Test your understanding of this chapter

Quiz

What is a token in the context of LLMs?

Token trong ngữ cảnh của LLM là gì?

True or False

A larger context window always means better AI responses.

Context window lớn hơn luôn có nghĩa là câu trả lời AI tốt hơn.

Quiz

What does RAG stand for?

RAG là viết tắt của gì?

Code Challenge

Complete the context window layers

Hoàn thành các lớp của context window

System  → CLAUDE.md → Conversation 
← → to navigate chapters
Built: 4/8/2026, 12:01:11 PM