Glossary

Vector Embeddings

Vector embeddings are numerical representations of complex data (like text, images, and audio) that capture semantic meaning by plotting the data as points in a high-dimensional space. This enables "semantic similarity," where the distance between two points, or vectors, directly measures how closely related they are.

This capability is fundamental to modern GenAI, enabling applications to generate custom images, provide nuanced answers from complex documents, or blend audio inputs—all without having to re-process the data's original format. While central to today's AI, embeddings have a history dating back to the 1950s and are still being actively innovated. Understanding them is key to understanding the past, present, and future of artificial intelligence.

Imagine organizing a kitchen. You group similar items together: fruits on one shelf, spices on another. Vector embeddings do the same with data. Related words like “apple” and “orange” are placed close together, just like fruits on the same shelf. Here, "apple" would be close to "pear" and far from "bananas" and even further from "flour". This measured distance is how models can find semantically related words or recommend similar products."

As a result of this foundational force, understanding vector embeddings is necessary to understanding AI as it once was, AI today, and AI tomorrow.

What are vector embeddings?

Vector embeddings are numerical representations of data that capture the essence of the data’s semantic meaning, no matter its format, within a high-dimensional vector space. Vector representations enable semantic similarity, which allows LLM models to quantify and measure the distance between different data points.

Once you’ve captured that essence, embeddings free models to consider semantic similarity, where the “distance” between vectors quantitatively reflects how similar or related the data points are to each other.

Imagine your kitchen, where you’ve arranged ingredients on shelves: fruits together on one shelf, spices on another, and snacks on another. This setup makes it easy to find what you’re looking for because similar items are grouped together.

Vector embeddings work similarly, but with data instead of kitchen ingredients. Think of each type of data as different ingredients placed on their specific shelves in the kitchen. Words that are related, such as “apple” and “orange,” are like fruits kept on the same shelf because they share similarities.

The key difference is that, while a kitchen shelf is a discrete space, vector embeddings represent relationships in a continuous, high-dimensional space. In the kitchen analogy, there would have to be room to represent the fact that "apple" would be on the "fruit shelf”, and it would also be closer to "pear" than to "banana," and very far from "flour", and even further from "blender”.

In vector embeddings, we measure this “distance” using methods that help us see how closely related two pieces of data are. This method enables models to find words that mean the same thing or recommend products that are alike.

Common use cases for vector embeddings

Vector embeddings are foundational to GenAI, powering a broad range of applications by enabling LLM systems to reason about the semantic similarities between disparate data points. Most importantly, embeddings enable LLMs to perform this comparison regardless of data format; once documents, images, users, and more are converted into embeddings, they can all be compared.

To see how vector embeddings are used beyond the abstract, it’s helpful to look at use cases in action. The following use cases differ significantly, but the diversity points to just how useful embeddings can be:

De-duplication at scale: Vector embeddings can identify near-identical items, such as duplicate support tickets, product listings, or documents. By measuring similarity in a vector space, systems can efficiently detect and remove redundant content.
Reverse image search: By embedding images into vectors, systems can retrieve visually similar images, even if they differ in size, angle, or lighting. Brand logo detection, content moderation, and visual product search features all depend on this ability.
Anomaly detection in real-time systems: Embeddings can reveal when a new data point doesn’t fit existing patterns, which is useful in fraud detection, cybersecurity, and system monitoring. Outliers in vector space often signal potential anomalies.
Conversational AI: Chatbots and other conversational AI applications use vector embeddings to match user queries to semantically similar knowledge base entries. Combined with Retrieval-Augmented Generation (RAG) and Redis, which makes RAG interactions function at real-time speeds, embeddings can support the retrieval of accurate, context-aware responses.
Personalization and recommendations: With embeddings, systems can surface personalized and recommended content by comparing users and items (e.g., songs, videos, and products). Ecommerce companies like Ulta Beauty, for example, use Redis vector search to personalize product results in real time.
Geographic or spatial search: Embeddings can represent location coordinates and contextual patterns (e.g., foot traffic and delivery times), which allows systems to use embeddings to find locations or offers that are “close” in both physical and behavioral space.
Legal and compliance document analysis: Law firms and corporations can use embeddings to detect precedent or clause similarity across complex, lengthy documents, which helps automate due diligence, contract review, and compliance checks.
Content moderation and filtering: By generating embeddings for social media posts, images, or videos, social networks can automatically flag harmful content that’s semantically similar, even when it has been altered or obfuscated.

Even these eight common use cases don’t cover all the ways that embeddings can be used. To understand why embeddings are so useful across so many use cases, let’s walk through how they work.

How vector embeddings work

When we work with embeddings, we want to help a computer, which only understands numbers, understand the meaning of words, images, and sounds. Vector embeddings bridge the translation problem between us.

First, we need to turn the input into numbers. A simple word to us, such as “apple,” is just a string of letters to a computer. To transform raw, unstructured inputs into embeddings, we need to convert the data – a word, in this case – into a list of numbers, a vector, that we can represent in a high-dimensional space.

The key here is that vectors represent the core semantic meaning of a data input, not every detail. The goal isn’t to contain all data, but to capture enough meaning that we can compare different inputs. In our example, the words “apple” and “banana” would be represented by vectors that are close together because they share similar meanings, but “apple” and “car” would be further apart because their meanings are very different.

Once we have vectors stored in a high-dimensional space, we can capture the differences between them using their quantitative distances. Vectors further apart are more different than vectors closer together. There are numerous ways to measure this distance:

Cosine similarity measures the angle between vectors, regardless of their magnitude.
Dot product measures both the direction and magnitude of vector alignment.
Euclidean distance measures the straight-line distance between vectors.

Let’s say we have the vector embeddings, the vector store, and a long list of different items, such as fruits. If we’re again trying to find the fruits most similar to “apple,” you could use a common machine learning algorithm called k-nearest neighbors (k-NN).

In a KNN search, the algorithm finds the "k" closest vectors (neighbors) to the vector representing "apple." The closer the vectors are, the more similar the items they represent. So, in this case, "banana" and "pear" might be among the nearest neighbors of "apple."

Together, this means we can represent different data formats as embeddings, compare them based on semantic similarity using distance, and use algorithmic search to find the closest vectors.

Creating vector embeddings: Manual vs. model-based

There are two primary approaches to generating vector embeddings: Manual feature engineering and model-based (i.e., learned) embeddings.

Early machine learning workflows relied heavily on manual human effort to hand-select and encode input data into numerical features, often based on simple patterns such as counts, frequencies, or statistical summaries.

Before the advent of deep learning, ML developers often used manual encodings to design and support features that represented their data. Feature engineering, in this context, refers to using domain knowledge to create relevant attributes that a model can learn.

A few examples include:

Doc2Vec for text: This early deep learning technique generated dense, fixed-length vector representations of entire documents based on their surrounding context. Unlike earlier, sparser methods, Doc2Vec captured semantic structure and enabled more meaningful comparisons between texts.
Spectrograms for audio: This feature uses visual representations of how the frequency content of an audio signal evolves over time. It was often used to extract time-frequency features in manual audio processing pipelines.
GraphSAGE for networked data: This graph neural network algorithm learned dense node embeddings by sampling and aggregating information from a node’s local neighborhood, allowing it to capture structural and relational patterns.

Model-based embeddings, in contrast, automatically learn embeddings by training on large datasets. Instead of manually defining which aspects of the data to emphasize, we train a model on large datasets to discover informative patterns on its own.

CNNs for images and transformers for text, such as Word2Vec, are able to capture complex patterns, reduce manual effort, support transfer learning, and scale up to support many more use cases in much shorter periods of time.

There are also pre-trained models, allowing you to avoid starting from scratch. Pre-trained models offer a shortcut to implementing vector embeddings, offering you a wide variety of embedding models for different types of data.

Different use cases will rely on one approach more than the other, which is why Redis supports both.

Types of embeddings and dimensionality

So far, we’ve focused on the power of embeddings and how they work to support a broad range of use cases. Embeddings can do a lot, but this external diversity is also parallel to an internal diversity. Not all vector embeddings are created equally. The purpose and structure of different embeddings can vary widely depending on the data type, use case, and model architecture.

Item vs. user embeddings

Item embeddings represent the properties of things; think products, songs, and movies. User embeddings represent behavior patterns; think past clicks, preferences, and interactions.

In a music streaming app, an item embedding might represent a song using its genre or tempo, and a user embedding might capture a user's listening habits over time. With both, you can enable personalization via nearest-neighbor matching.

Dimensionality trade-offs

Dimensionality refers to the number of numerical features (or axes) that define a vector. When you measure dimensionality, you’re measuring the number of values (i.e., dimensions) in the embedding vector.

Not all vectors have the same levels of dimensionality, and having more or less dimensionality presents different tradeoffs. Lower-dimensional vectors are smaller and faster, but can miss nuance, whereas higher-dimensional vectors offer richer semantic information but require more memory and compute.

The choice of lower or higher dimensionality depends on your use case, the complexity of your data, and the capabilities of your vector database.

Abstract embeddings

So far, we’ve used simple examples to demonstrate how vector embeddings work, but that doesn’t mean they have to be simple. But vector embeddings aren’t just for tangible inputs. They can also represent intangible concepts, such as political ideology, weather patterns, and customer sentiment.

Given sufficiently large datasets, machine learning models can learn to embed conceptual or time-series data, enabling even more complex and nuanced reasoning.

How vector embeddings power LLM and RAG workflows

We all remember our first, almost magical experience with ChatGPT, when we first input a question and received a shockingly clear answer almost instantly. But we also remember further occasions when the answer sounded correct, but it just wasn’t. This core, common dilemma is why LLMs are increasingly paired with external retrieval systems to improve accuracy and relevance – an approach known as Retrieval-Augmented Generation (RAG).

RAG is a technique that retrieves relevant context from an external database and feeds it to an LLM that uses the information to ground its response. Vector embeddings, stored in databases such as those provided by Redis, enable fast and semantically accurate retrieval.

It’s worth remembering, despite that magic feeling, that LLMs on their own don’t “know” information, especially information that’s new or private. Vector search allows developers to store and retrieve external context that matches the user’s query in meaning, not just keywords, allowing LLMs to generate accurate, relevant information even in contexts where generic LLMs would fail.

There are numerous infrastructure options to support RAG, but picking the right infrastructure for your use case can’t be a hasty decision. You need to figure out which features are necessary for your scenario and which tooling your choice needs to integrate with.

Redis, for example, supports:

Hybrid search, which combines text and vector search
Sub-ms nearest neighbor search across millions of embeddings
Memory tiering to scale RAG use cases efficiently

No solution can handle everything alone, however, which is why Redis integrates with numerous open source tools, such as:

LangChain via RedisVectorStore
LLamaIndex for document indexing
Redis Vector Library (RedisVL) for standardized ingestion and retrieval
Agent Memory Server for long-term memory in multi-agent systems

RAG works because it provides context-aware information to the LLMs you’re using. As a result, RAG works best when you choose infrastructure that works within the context of your use case.

Why Redis for vector embeddings and nearest-neighbor search?

To bring vector embeddings into production, you need a system that’s fast, scalable, and easy to integrate. Redis delivers on all three. Not all databases are the same, so the best systems use a combination of databases to balance tradeoffs like speed, scalability, and reliability.

Redis supports:

Sub-millisecond k-NN search: With HNSW (Hierarchical Navigable Small World) indexing, Redis enables ultra-fast, memory-efficient approximate nearest neighbor retrieval.
Hybrid vector + text search: Redis Query Engine combines structured, unstructured, and vector filters in a single query.
Flexible deployment options: Organizations can use Redis in Redis Cloud (fully managed), Redis Software (self-hosted), or Community Edition, all of which support use cases ranging from prototyping to production.
Active-active geo-distribution: Redis Enterprise provides multi-region support with strong consistency and automatic failover, making it ideal for global RAG and recommendation systems.
Memory tiering for scale: Redis supports disk-based indexing and intelligent tiering to handle large volumes of high-dimensional data without wasting RAM.

Given the breadth of features Redis supports and the sheer speed that outpaces the competition, Redis can serve as a versatile data store for production-ready vector platforms built for real-time applications that depend on fast, scalable, and reliable vector search.

Ready to put vector embeddings to work? Try Redis free or book a demo to see how Redis powers real-time semantic search at scale.

FAQs

How many dimensions should I use for a vector embedding?

The ideal number of dimensions for a vector embedding depends on data complexity and use case. Start with 128–300 for most tasks (such as semantic search or recommendations). Use higher dimensions (512–1024) for complex data (such as long text and images), which is when you need rich detail. Lower dimensions (16–64) are faster and lighter but can lose nuance. Test different sizes to find the best trade-off between performance, accuracy, and resource cost.

What’s the difference between vector embeddings and sparse vectors?

Vector embeddings are dense vectors. They’re compact, learned representations where most values are non-zero. They capture semantic or structural meaning and are often produced by neural networks. Sparse vectors, such as TF-IDF or one-hot encodings, typically have mostly zero values and high dimensionality. They are typically manually engineered and don’t capture meaning beyond raw frequency or presence. Embeddings are better for modeling similarity and relationships, while sparse vectors are simpler but less expressive.

How does Redis perform nearest-neighbor search so fast?

Redis achieves especially fast nearest-neighbor search by using HNSW (Hierarchical Navigable Small World), an efficient graph-based approximate nearest-neighbor (ANN) algorithm. Vectors are organized into multi-layer small-world graphs, where the top layer provides a coarse overview and the lower layers offer fine-grained navigation. Queries traverse these layers hierarchically, quickly zeroing in on nearest neighbors with logarithmic complexity. This ANN approach avoids exhaustive comparisons, offering sub-linear search speed even in high-dimensional spaces. Redis’s implementation allows real-time indexing and querying at massive scale.

Can I use Redis with an LLM for chatbots?

Yes. Redis natively supports vector search using efficient HNSW and flat indexes, which enables fast retrieval of semantically relevant embeddings. These features make it ideal for session memory, allowing it to store past exchanges between users and chatbots as embeddings to retrieve contextually relevant history during the conversation.