Deploy GenAI apps faster with Redis and NVIDIA NIM

June 02, 2024

Accelerate your GenAI app development with Redis–the world’s fastest data platform for real-time data and AI apps. Now, with Redis and NVIDIA NIM inference microservices, you can build and deploy GenAI apps faster.

Companies are looking for ways to bring their GenAI apps to production, so they can apply recent advances and provide a better experience for their customers. Building GenAI apps has all the usual challenges of software development–integrations, testing, and scaling, but AI takes it to another level. To stay ahead, companies need a simple and reliable infrastructure that adapts with technology.

To help companies bring GenAI apps to production faster, Redis is using NVIDIA NIM to provide ready-made infrastructure for fast data access and AI models. NIM, part of the NVIDIA AI Enterprise software platform for GenAI development and deployment, can be combined with Redis for fast and flexible deployment.

Devs rely on Redis for their real-time data and AI needs, for everything ranging from customer support agents, to fraud and anomaly detection, to real-time product recommendations. With NIM, you can skip the setup and maintenance of full-stack infrastructure to run the latest GenAI models. NIM streamlines AI model deployment with pre-built, cloud-native microservices that are maintained to deliver optimized inference on NVIDIA accelerated infrastructure.

You can use NIM alongside your existing data in Redis to use the latest Redis features like vector database and semantic caching. Use Redis as your vector database for faster information access using Retrieval Augmented Generation (RAG) with models from NIM. Plus, use Redis semantic caching to cache the LLM responses for your GenAI apps. With both, you can reduce costs and speed up responses to provide the real-time experience that users expect.

To show how easy it is to get started with Redis and NVIDIA NIM, we’ll walk through this demo to build a simple chatbot that uses RAG with a Redis vector database and NIM for the model and inferencing for fast responses. In this example, we’ll ask it questions about the Chevy Colorado user manual. Let’s get started. You can follow along using this notebook.

First, connect to the NVIDIA-hosted LLM and embedding models. You can use your existing API key or get one here. We’ll use NIM for its simple and fast access to the latest models and LangChain to create embeddings.

We’ll be working with a pdf of the Chevy Colorado truck user manual. It’s full of qualitative and quantitative information about the Chevy vehicle. Once the data is imported into the notebook, we’ll prepare the document for RAG using LangChain.

We’ll take those chunks of data from the Chevy Colorado brochure and load them into Redis vector database for fast retrieval.

To set up our RAG to get the most out of the NIM model, we’ll design a RAG prompt that describes the dataset and how we want the LLM to respond.

Let’s set up our RAG chain using LangChain Expression Language (LCEL).

Once everything is set up, we can go ahead and ask it our question. The chatbot will send the question and relevant information provided from Redis vector database to the LLM with NIM to generate a response.

The app sends back an appropriate response from the source documents that matches our request.

Let’s go one step further and make the application add sources to the response. This helps users explore the docs and verify the information themselves.

We’ll ask our question once and get the following response.

We hope you enjoyed the demo. You can now use Redis and NVIDIA NIM for your own apps.

To get started: