Vector set is a new data type that is currently in preview and may be subject to change.
A Redis vector set lets
you store a set of unique keys, each with its own associated vector.
You can then retrieve keys from the set according to the similarity between
their stored vectors and a query vector that you specify.
You can use vector sets to store any type of numeric vector but they are
particularly optimized to work with text embedding vectors (see
Redis for AI to learn more about text
embeddings). The example below shows how to use the
@xenova/transformers
library to generate vector embeddings and then
store and retrieve them using a vector set with node-redis.
Initialize
Start by installingnode-redis
if you haven't already done so. Also, install @xenova/transformers:
npm install @xenova/transformers
In your JavaScript source file, import the required classes:
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
The first of these imports is the
@xenova/transformers class, which generates an embedding from a section of text.
This example uses transformers.pipeline with the
all-MiniLM-L6-v2
model for the embeddings. This model generates vectors with 384 dimensions, regardless
of the length of the input text, but note that the input is truncated to 256
tokens (see
Word piece tokenization
at the Hugging Face docs to learn more about the way tokens
are related to the original text).
The output from transformers.pipeline is a function (called pipe in the examples)
that you can call to generate embeddings. The pipeOptions object is a parameter for
pipe that specifies how to generate sentence embeddings from token embeddings (see the
all-MiniLM-L6-v2
documentation for details).
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
Create the data
The example data is contained in an object with some brief
descriptions of famous people:
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
Add the data to a vector set
The next step is to connect to Redis and add the data to a new vector set.
The code below iterates through all the key-value pairs in the peopleData object
and adds corresponding elements to a vector set called famousPeople.
Use the pipe() function created above to generate the
embedding and then use Array.from() to convert the embedding to an array
of float32 values that you can pass to the
vAdd() command to set the embedding.
The call to vAdd() also adds the born and died values from the
peopleData object as attribute data. You can access this during a query
or by using the vGetAttr() method.
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
Query the vector set
You can now query the data in the set. The basic approach is to use the
pipe() function to generate another embedding vector for the query text.
(This is the same method used to add the elements to the set.) Then, pass
the query vector to vSim() to return elements
of the set, ranked in order of similarity to the query.
Start with a simple query for "actors":
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
This returns the following list of elements (formatted slightly for clarity):
The first two people in the list are the two actors, as expected, but none of the
people from Linus Pauling onward was especially well-known for acting (and there certainly
isn't any information about that in the short description text).
As it stands, the search attempts to rank all the elements in the set, based
on the information contained in the embedding model.
You can use the COUNT parameter of vSim() to limit the list of elements
to just the most relevant few items:
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
The reason for using text embeddings rather than simple text search
is that the embeddings represent semantic information. This allows a query
to find elements with a similar meaning even if the text is
different. For example, the word "entertainer" doesn't appear in any of the
descriptions but if you use it as a query, the actors and musicians are ranked
highest in the results list:
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
Similarly, if you use "science" as a query, you get the following results:
The scientists are ranked highest but they are then followed by the
mathematicians. This seems reasonable given the connection between mathematics
and science.
You can also use
filter expressions
with vSim() to restrict the search further. For example,
repeat the "science" query, but this time limit the results to people
who died before the year 2000:
import*astransformersfrom'@xenova/transformers';import{createClient}from'redis';constpipe=awaittransformers.pipeline('feature-extraction','Xenova/all-MiniLM-L6-v2');constpipeOptions={pooling:'mean',normalize:true,};constpeopleData={"Marie Curie":{"born":1867,"died":1934,"description":`
Polish-French chemist and physicist. The only person ever to win
two Nobel prizes for two different sciences.
`},"Linus Pauling":{"born":1901,"died":1994,"description":`
American chemist and peace activist. One of only two people to win two
Nobel prizes in different fields (chemistry and peace).
`},"Freddie Mercury":{"born":1946,"died":1991,"description":`
British musician, best known as the lead singer of the rock band
Queen.
`},"Marie Fredriksson":{"born":1958,"died":2019,"description":`
Swedish multi-instrumentalist, mainly known as the lead singer and
keyboardist of the band Roxette.
`},"Paul Erdos":{"born":1913,"died":1996,"description":`
Hungarian mathematician, known for his eccentric personality almost
as much as his contributions to many different fields of mathematics.
`},"Maryam Mirzakhani":{"born":1977,"died":2017,"description":`
Iranian mathematician. The first woman ever to win the Fields medal
for her contributions to mathematics.
`},"Masako Natsume":{"born":1957,"died":1985,"description":`
Japanese actress. She was very famous in Japan but was primarily
known elsewhere in the world for her portrayal of Tripitaka in the
TV series Monkey.
`},"Chaim Topol":{"born":1935,"died":2023,"description":`
Israeli actor and singer, usually credited simply as 'Topol'. He was
best known for his many appearances as Tevye in the musical Fiddler
on the Roof.
`}};constclient=createClient({url:'redis://localhost:6379'});client.on('error',err=>console.log('Redis Client Error',err));awaitclient.connect();for(const[name,details]ofObject.entries(peopleData)){constembedding=awaitpipe(details.description,pipeOptions);constembeddingArray=Array.from(embedding.data);awaitclient.vAdd('famousPeople',embeddingArray,name);awaitclient.vSetAttr('famousPeople',name,JSON.stringify({born:details.born,died:details.died}));}constqueryValue="actors";constqueryEmbedding=awaitpipe(queryValue,pipeOptions);constqueryArray=Array.from(queryEmbedding.data);constactorsResults=awaitclient.vSim('famousPeople',queryArray);console.log(`'actors': ${JSON.stringify(actorsResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue2="actors";constqueryEmbedding2=awaitpipe(queryValue2,pipeOptions);constqueryArray2=Array.from(queryEmbedding2.data);consttwoActorsResults=awaitclient.vSim('famousPeople',queryArray2,{COUNT:2});console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
constqueryValue3="entertainer";constqueryEmbedding3=awaitpipe(queryValue3,pipeOptions);constqueryArray3=Array.from(queryEmbedding3.data);constentertainerResults=awaitclient.vSim('famousPeople',queryArray3);console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
// "Marie Curie","Paul Erdos"]
constqueryValue4="science";constqueryEmbedding4=awaitpipe(queryValue4,pipeOptions);constqueryArray4=Array.from(queryEmbedding4.data);constscienceResults=awaitclient.vSim('famousPeople',queryArray4);console.log(`'science': ${JSON.stringify(scienceResults)}`);// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
// "Chaim Topol"]
constqueryValue5="science";constqueryEmbedding5=awaitpipe(queryValue5,pipeOptions);constqueryArray5=Array.from(queryEmbedding5.data);constscience2000Results=awaitclient.vSim('famousPeople',queryArray5,{FILTER:'.died < 2000'});console.log(`'science2000': ${JSON.stringify(science2000Results)}`);// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
// "Masako Natsume","Freddie Mercury"]
awaitclient.quit();
Note that the boolean filter expression is applied to items in the list
before the vector distance calculation is performed. Items that don't
pass the filter test are removed from the results completely, rather
than just reduced in rank. This can help to improve the performance of the
search because there is no need to calculate the vector distance for
elements that have already been filtered out of the search.
More information
See the vector sets
docs for more information and code examples. See the
Redis for AI section for more details
about text embeddings and other AI techniques you can use with Redis.
You may also be interested in
vector search.
This is a feature of the
Redis query engine
that lets you retrieve
JSON and
hash documents based on
vector data stored in their fields.