Search that finds the right document even when it shares no keywords with your query; AI that retrieves the relevant part of your notes to answer a question β both run on embeddings. An embedding turns data into numbers that capture meaning, so a computer can measure how similar two things are. This guide explains what an embedding is, how it works, what it's used for, and why it underpins modern search and AI.
What an embedding is
An embedding represents data β a word, sentence, image β as a vector: a list of numbers (often hundreds or thousands) that encodes its meaning. The defining property: items with similar meaning get vectors that are close together in this numeric space, and unrelated items are far apart.
So "dog" and "puppy" land near each other, far from "spreadsheet." Embeddings let computers measure semantic similarity mathematically β the foundation of modern search, recommendations and retrieval-augmented AI.
How it works
An embedding model (usually a neural network) is trained so it maps each input to a point in a high-dimensional space where meaning is encoded by position. Things used in similar contexts end up near each other.
Feed it text (or an image) and it outputs a fixed-length vector. To compare two items, you measure the distance or angle between their vectors β commonly cosine similarity. Closer means more similar in meaning. The model doesn't "understand" in a human sense; it captures statistical patterns of similarity.
What embeddings are used for
- Semantic search β find documents about a topic even without shared keywords.
- Retrieval-augmented generation (RAG) β embed your documents and a question, retrieve the closest chunks to feed an LLM. This is exactly how RAG works.
- Recommendations β suggest items whose embeddings are near things you liked.
- Clustering & classification β group or label data by similarity.
- Deduplication & anomaly detection.
Anywhere you need "how similar in meaning are these two things?", embeddings are the tool.
Embedding vs token
Related steps. A token is a small unit of text (a word or word-piece) a model reads. An embedding is the numeric vector that represents meaning β and inside a model, each token is converted into an embedding before processing. Tokens are how text is chopped up; embeddings are how those pieces become meaningful numbers. In search/RAG, "an embedding" usually means one vector for a whole chunk of text.
The honest limit
Embeddings are powerful but approximate. They capture statistical patterns from training data, so quality depends on the model and domain β a model trained on general web text may misjudge specialised jargon, and biases carry into the vectors. Different models produce incompatible embeddings, so you can't mix vectors across models. They're a remarkably useful proxy for meaning, not a true understanding of language.
The bottom line
An embedding turns data into a vector of numbers that captures meaning, placing similar things close together so similarity becomes a measurable distance. It's the quiet engine behind semantic search, recommendations and RAG. Just remember it's an approximation shaped by its training model β extraordinarily useful, but a proxy for meaning rather than comprehension.