AI Fundamentals - Use AI

What Are Tokens?

Tokens are the smallest units that large language models (LLMs) process. Instead of reading character by character or word by word, LLMs split text into tokens - which can be words, parts of words, or even individual characters depending on the language and context.

Understanding tokens is crucial because they directly affect API costs, context limits, and how you structure your prompts.

Token Count Examples

"Hello world"2 tokens

function add(a,b){return a+b}12 tokens

~100 lines of code800 tokens

How Tokenization Works

When you send text to an LLM, it gets split into tokens first. For example, the word "understanding" might be split into "under" + "stand" + "ing". Code typically generates more tokens because of symbols and syntax.

Tokenization Example

1// This simple function...2function greet(name) {3  return "Hello, " + name;4}56// ...becomes approximately 15-20 tokens:7// "function" "greet" "(" "name" ")" "{" "return"8// """ "Hello" "," """ "+" "name" ";" "}"

Info

Rule of thumb: 1 token is roughly 4 English characters, or about 3/4 of a word. Code typically has higher token density due to punctuation and syntax.

Context Window Explained

The context window is the LLM's "working memory" - the total number of tokens it can process in a single interaction. This includes everything: the system prompt, conversation history, file contents you share, and its response.

Context Window Composition

Tool Results / Available Space

Grep, Bash output + room for response

File Contents

Code Claude has read

Conversation History

Previous messages exchanged

CLAUDE.md

Project-specific guidelines

System Prompt

Fixed instructions for the AI

Context Limits and Implications

Different models have different context limits. Claude has a context window of up to 200K tokens, allowing it to process large codebases. However, efficient context management is still crucial.

Full context: When the context window fills up, older information gets truncated or summarized
Cost: More tokens means higher API costs
Performance: Very large contexts can slow down response times
Accuracy: Information in the middle of long contexts may get "lost in the middle"

How LLMs Generate Code

LLMs don't "understand" code the way humans do. Instead, they predict the most likely next token based on patterns learned from millions of lines of code in their training data.

LLM Code Generation Pipeline

Input

Your prompt

Tokenize

Split into tokens

Process

Neural network

Generate

Predict next token

Output

Complete response

Temperature and Sampling

When an LLM generates text, it calculates probabilities for each possible next token. "Temperature" controls how random the selection is:

Low temperature (0.0-0.3): More deterministic and focused output, good for code
High temperature (0.7-1.0): More varied and creative output, good for brainstorming

Why Responses Vary

Even with the same prompt, an LLM may generate slightly different responses due to random sampling. This is why:

The same question might yield different code
Rerunning a prompt might improve results
More specific prompts lead to more consistent outputs

RAG Basics

RAG (Retrieval-Augmented Generation) is a technique that helps LLMs access information beyond their trained knowledge. Instead of relying only on what it learned, the LLM can search for and use relevant information from external sources.

Query

→

Search Codebase

→

Retrieve Context

→

Generate Response

Why Claude Code Uses RAG

Claude Code uses RAG to work effectively with large codebases. Instead of trying to fit the entire codebase into the context window, it:

Searches for relevant files when needed (using Grep, Glob)
Only reads the portions of code necessary for the task
Combines retrieved information with trained knowledge
Generates accurate, up-to-date responses about your project

Info

This is why Claude Code can work with projects containing millions of lines of code - it doesn't need to "remember" everything at once.

Practical Implications

Token Efficiency Tips

Understanding how tokens work helps you write more efficient prompts:

Be concise: Remove filler words from your prompts
Be specific: Point to exact files or functions instead of asking to search
Break up tasks: Large tasks can be split into smaller conversations
Use CLAUDE.md: Put repeated instructions in this file instead of repeating each time

Context Management Preview

In the next chapter, you'll learn how to manage the context window effectively, including:

How Claude Code automatically compresses context
Techniques to keep context focused and relevant
When to start a new conversation

Key Takeaways

Điểm Chính

Tokens are the fundamental units LLMs process - typically 1 token equals 4 charactersToken là đơn vị cơ bản LLM xử lý - thường 1 token bằng 4 ký tự
Context window is limited working memory - everything must fit including your prompt and the responseContext window là bộ nhớ làm việc có giới hạn - mọi thứ phải vừa bao gồm prompt và câu trả lời
LLMs generate code by predicting the most likely next token based on learned patternsLLM tạo code bằng cách dự đoán token tiếp theo có khả năng nhất dựa trên pattern đã học
RAG allows Claude Code to work with large codebases by retrieving relevant context on demandRAG cho phép Claude Code làm việc với codebase lớn bằng cách truy xuất context liên quan khi cần

Practice

Test your understanding of this chapter

Quiz

What is a token in the context of LLMs?

Token trong ngữ cảnh của LLM là gì?

True or False

A larger context window always means better AI responses.

Context window lớn hơn luôn có nghĩa là câu trả lời AI tốt hơn.

Quiz

What does RAG stand for?

RAG là viết tắt của gì?

Code Challenge

Complete the context window layers

Hoàn thành các lớp của context window

System  → CLAUDE.md → Conversation