Skip to content

Commit 6ddd05b

Browse files
wip on user guide
1 parent b6f6d69 commit 6ddd05b

File tree

1 file changed

+398
-0
lines changed

1 file changed

+398
-0
lines changed
+398
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,398 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Semantic Routing\n",
8+
"\n",
9+
"RedisVL provides a `SemanticRouter` interface to utilize Redis' built-in search & aggregation in order to perform\n",
10+
"KNN-style classification over a set of `Route` references to determine the best match.\n",
11+
"\n",
12+
"This notebook will go over how to use Redis as a Semantic Router for your applications"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"metadata": {},
18+
"source": [
19+
"## Define the Routes\n",
20+
"\n",
21+
"Below we define 3 different routes. One for `technology`, one for `sports`, and\n",
22+
"another for `entertainment`. Now for this example, the goal here is\n",
23+
"surely topic \"classification\". But you can create routes and references for\n",
24+
"almost anything.\n",
25+
"\n",
26+
"Each route has a set of references that cover the \"semantic surface area\" of the\n",
27+
"route. The incoming query from a user needs to be semantically similar to one or\n",
28+
"more of the references in order to \"match\" on the route."
29+
]
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": 1,
34+
"metadata": {},
35+
"outputs": [],
36+
"source": [
37+
"from redisvl.extensions.router import Route\n",
38+
"\n",
39+
"\n",
40+
"# Define routes for the semantic router\n",
41+
"technology = Route(\n",
42+
" name=\"technology\",\n",
43+
" references=[\n",
44+
" \"what are the latest advancements in AI?\",\n",
45+
" \"tell me about the newest gadgets\",\n",
46+
" \"what's trending in tech?\"\n",
47+
" ],\n",
48+
" metadata={\"category\": \"tech\", \"priority\": 1}\n",
49+
")\n",
50+
"\n",
51+
"sports = Route(\n",
52+
" name=\"sports\",\n",
53+
" references=[\n",
54+
" \"who won the game last night?\",\n",
55+
" \"tell me about the upcoming sports events\",\n",
56+
" \"what's the latest in the world of sports?\",\n",
57+
" \"sports\",\n",
58+
" \"basketball and football\"\n",
59+
" ],\n",
60+
" metadata={\"category\": \"sports\", \"priority\": 2}\n",
61+
")\n",
62+
"\n",
63+
"entertainment = Route(\n",
64+
" name=\"entertainment\",\n",
65+
" references=[\n",
66+
" \"what are the top movies right now?\",\n",
67+
" \"who won the best actor award?\",\n",
68+
" \"what's new in the entertainment industry?\"\n",
69+
" ],\n",
70+
" metadata={\"category\": \"entertainment\", \"priority\": 3}\n",
71+
")\n"
72+
]
73+
},
74+
{
75+
"cell_type": "markdown",
76+
"metadata": {},
77+
"source": [
78+
"## Initialize the SemanticRouter\n",
79+
"\n",
80+
"``SemanticRouter`` will automatically create an index within Redis upon initialization for the route references. By default, it uses the `HFTextVectorizer` to \n",
81+
"generate embeddings for each route reference."
82+
]
83+
},
84+
{
85+
"cell_type": "code",
86+
"execution_count": 2,
87+
"metadata": {},
88+
"outputs": [
89+
{
90+
"name": "stdout",
91+
"output_type": "stream",
92+
"text": [
93+
"14:09:10 redisvl.index.index INFO Index already exists, overwriting.\n"
94+
]
95+
}
96+
],
97+
"source": [
98+
"import os\n",
99+
"from redisvl.extensions.router import SemanticRouter\n",
100+
"from redisvl.utils.vectorize import HFTextVectorizer\n",
101+
"\n",
102+
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
103+
"\n",
104+
"# Initialize the SemanticRouter\n",
105+
"router = SemanticRouter(\n",
106+
" name=\"topic-router\",\n",
107+
" vectorizer=HFTextVectorizer(),\n",
108+
" routes=[technology, sports, entertainment],\n",
109+
" redis_url=\"redis://localhost:6379\",\n",
110+
" overwrite=True # Blow away any other routing index with this name\n",
111+
")"
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": 3,
117+
"metadata": {},
118+
"outputs": [
119+
{
120+
"data": {
121+
"text/plain": [
122+
"HFTextVectorizer(model='sentence-transformers/all-mpnet-base-v2', dims=768)"
123+
]
124+
},
125+
"execution_count": 3,
126+
"metadata": {},
127+
"output_type": "execute_result"
128+
}
129+
],
130+
"source": [
131+
"router.vectorizer"
132+
]
133+
},
134+
{
135+
"cell_type": "code",
136+
"execution_count": 4,
137+
"metadata": {},
138+
"outputs": [
139+
{
140+
"name": "stderr",
141+
"output_type": "stream",
142+
"text": [
143+
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
144+
"To disable this warning, you can either:\n",
145+
"\t- Avoid using `tokenizers` before the fork if possible\n",
146+
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
147+
]
148+
},
149+
{
150+
"name": "stdout",
151+
"output_type": "stream",
152+
"text": [
153+
"\n",
154+
"\n",
155+
"Index Information:\n",
156+
"╭──────────────┬────────────────┬──────────────────┬─────────────────┬────────────╮\n",
157+
"│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │\n",
158+
"├──────────────┼────────────────┼──────────────────┼─────────────────┼────────────┤\n",
159+
"│ topic-router │ HASH │ ['topic-router'] │ [] │ 0 │\n",
160+
"╰──────────────┴────────────────┴──────────────────┴─────────────────┴────────────╯\n",
161+
"Index Fields:\n",
162+
"╭────────────┬─────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮\n",
163+
"│ Name │ Attribute │ Type │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │\n",
164+
"├────────────┼─────────────┼────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤\n",
165+
"│ route_name │ route_name │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n",
166+
"│ reference │ reference │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │\n",
167+
"│ vector │ vector │ VECTOR │ algorithm │ FLAT │ data_type │ FLOAT32 │ dim │ 768 │ distance_metric │ COSINE │\n",
168+
"╰────────────┴─────────────┴────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯\n"
169+
]
170+
}
171+
],
172+
"source": [
173+
"# look at the index specification created for the semantic router\n",
174+
"!rvl index info -i topic-router"
175+
]
176+
},
177+
{
178+
"cell_type": "markdown",
179+
"metadata": {},
180+
"source": [
181+
"## Simple routing"
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": 5,
187+
"metadata": {},
188+
"outputs": [
189+
{
190+
"data": {
191+
"text/plain": [
192+
"RouteMatch(route=Route(name='technology', references=['what are the latest advancements in AI?', 'tell me about the newest gadgets', \"what's trending in tech?\"], metadata={'category': 'tech', 'priority': '1'}, distance_threshold=None), distance=0.119614183903)"
193+
]
194+
},
195+
"execution_count": 5,
196+
"metadata": {},
197+
"output_type": "execute_result"
198+
}
199+
],
200+
"source": [
201+
"# Query the router with a statement\n",
202+
"route_match = router(\"Can you tell me about the latest in artificial intelligence?\")\n",
203+
"route_match"
204+
]
205+
},
206+
{
207+
"cell_type": "code",
208+
"execution_count": 6,
209+
"metadata": {},
210+
"outputs": [
211+
{
212+
"data": {
213+
"text/plain": [
214+
"RouteMatch(route=Route(name='sports', references=['who won the game last night?', 'tell me about the upcoming sports events', \"what's the latest in the world of sports?\", 'sports', 'basketball and football'], metadata={'category': 'sports', 'priority': '2'}, distance_threshold=None), distance=0.554210186005)"
215+
]
216+
},
217+
"execution_count": 6,
218+
"metadata": {},
219+
"output_type": "execute_result"
220+
}
221+
],
222+
"source": [
223+
"# Toggle the runtime distance threshold\n",
224+
"route_match = router(\"Which basketball team will win the NBA finals?\", distance_threshold=0.7)\n",
225+
"route_match"
226+
]
227+
},
228+
{
229+
"cell_type": "markdown",
230+
"metadata": {},
231+
"source": [
232+
"We can also route a statement to many routes and order them by distance:"
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"execution_count": 7,
238+
"metadata": {},
239+
"outputs": [
240+
{
241+
"data": {
242+
"text/plain": [
243+
"[RouteMatch(route=Route(name='sports', references=['who won the game last night?', 'tell me about the upcoming sports events', \"what's the latest in the world of sports?\", 'sports', 'basketball and football'], metadata={'category': 'sports', 'priority': '2'}, distance_threshold=None), distance=0.758580672741),\n",
244+
" RouteMatch(route=Route(name='entertainment', references=['what are the top movies right now?', 'who won the best actor award?', \"what's new in the entertainment industry?\"], metadata={'category': 'entertainment', 'priority': '3'}, distance_threshold=None), distance=0.812423805396),\n",
245+
" RouteMatch(route=Route(name='technology', references=['what are the latest advancements in AI?', 'tell me about the newest gadgets', \"what's trending in tech?\"], metadata={'category': 'tech', 'priority': '1'}, distance_threshold=None), distance=0.884235262871)]"
246+
]
247+
},
248+
"execution_count": 7,
249+
"metadata": {},
250+
"output_type": "execute_result"
251+
}
252+
],
253+
"source": [
254+
"# Perform multi-class classification with route_many() -- toggle the max_k and the distance_threshold\n",
255+
"route_matches = router.route_many(\"Lebron James\", distance_threshold=1.0, max_k=3)\n",
256+
"route_matches"
257+
]
258+
},
259+
{
260+
"cell_type": "code",
261+
"execution_count": 8,
262+
"metadata": {},
263+
"outputs": [
264+
{
265+
"data": {
266+
"text/plain": [
267+
"[RouteMatch(route=Route(name='sports', references=['who won the game last night?', 'tell me about the upcoming sports events', \"what's the latest in the world of sports?\", 'sports', 'basketball and football'], metadata={'category': 'sports', 'priority': '2'}, distance_threshold=None), distance=0.663254022598),\n",
268+
" RouteMatch(route=Route(name='entertainment', references=['what are the top movies right now?', 'who won the best actor award?', \"what's new in the entertainment industry?\"], metadata={'category': 'entertainment', 'priority': '3'}, distance_threshold=None), distance=0.712985336781),\n",
269+
" RouteMatch(route=Route(name='technology', references=['what are the latest advancements in AI?', 'tell me about the newest gadgets', \"what's trending in tech?\"], metadata={'category': 'tech', 'priority': '1'}, distance_threshold=None), distance=0.832674443722)]"
270+
]
271+
},
272+
"execution_count": 8,
273+
"metadata": {},
274+
"output_type": "execute_result"
275+
}
276+
],
277+
"source": [
278+
"# Toggle the aggregation method -- note the different distances in the result\n",
279+
"from redisvl.extensions.router.schema import DistanceAggregationMethod\n",
280+
"\n",
281+
"route_matches = router.route_many(\"Lebron James\", aggregation_method=DistanceAggregationMethod.min, distance_threshold=1.0, max_k=3)\n",
282+
"route_matches"
283+
]
284+
},
285+
{
286+
"cell_type": "markdown",
287+
"metadata": {},
288+
"source": [
289+
"Note the different route match distances. This is because we used the `min` aggregation method instead of the default `avg` approach."
290+
]
291+
},
292+
{
293+
"cell_type": "markdown",
294+
"metadata": {},
295+
"source": [
296+
"## Update the routing config"
297+
]
298+
},
299+
{
300+
"cell_type": "code",
301+
"execution_count": 9,
302+
"metadata": {},
303+
"outputs": [],
304+
"source": [
305+
"from redisvl.extensions.router import RoutingConfig\n",
306+
"\n",
307+
"router.update_routing_config(\n",
308+
" RoutingConfig(distance_threshold=1.0, aggregation_method=DistanceAggregationMethod.min, max_k=3)\n",
309+
")"
310+
]
311+
},
312+
{
313+
"cell_type": "code",
314+
"execution_count": 10,
315+
"metadata": {},
316+
"outputs": [
317+
{
318+
"data": {
319+
"text/plain": [
320+
"[RouteMatch(route=Route(name='sports', references=['who won the game last night?', 'tell me about the upcoming sports events', \"what's the latest in the world of sports?\", 'sports', 'basketball and football'], metadata={'category': 'sports', 'priority': '2'}, distance_threshold=None), distance=0.663254022598),\n",
321+
" RouteMatch(route=Route(name='entertainment', references=['what are the top movies right now?', 'who won the best actor award?', \"what's new in the entertainment industry?\"], metadata={'category': 'entertainment', 'priority': '3'}, distance_threshold=None), distance=0.712985336781),\n",
322+
" RouteMatch(route=Route(name='technology', references=['what are the latest advancements in AI?', 'tell me about the newest gadgets', \"what's trending in tech?\"], metadata={'category': 'tech', 'priority': '1'}, distance_threshold=None), distance=0.832674443722)]"
323+
]
324+
},
325+
"execution_count": 10,
326+
"metadata": {},
327+
"output_type": "execute_result"
328+
}
329+
],
330+
"source": [
331+
"route_matches = router.route_many(\"Lebron James\")\n",
332+
"route_matches"
333+
]
334+
},
335+
{
336+
"cell_type": "markdown",
337+
"metadata": {},
338+
"source": [
339+
"## Clean up the router"
340+
]
341+
},
342+
{
343+
"cell_type": "code",
344+
"execution_count": 11,
345+
"metadata": {},
346+
"outputs": [
347+
{
348+
"ename": "AttributeError",
349+
"evalue": "'SearchIndex' object has no attribute 'clear'",
350+
"output_type": "error",
351+
"traceback": [
352+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
353+
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
354+
"Cell \u001b[0;32mIn[11], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# Use clear to flush all routes from the index\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m \u001b[43mrouter\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mclear\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
355+
"File \u001b[0;32m~/AppliedAI/redis-vl-python/redisvl/extensions/router/semantic.py:437\u001b[0m, in \u001b[0;36mSemanticRouter.clear\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 436\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mclear\u001b[39m(\u001b[38;5;28mself\u001b[39m):\n\u001b[0;32m--> 437\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_index\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mclear\u001b[49m()\n",
356+
"\u001b[0;31mAttributeError\u001b[0m: 'SearchIndex' object has no attribute 'clear'"
357+
]
358+
}
359+
],
360+
"source": [
361+
"# Use clear to flush all routes from the index\n",
362+
"router.clear()"
363+
]
364+
},
365+
{
366+
"cell_type": "code",
367+
"execution_count": null,
368+
"metadata": {},
369+
"outputs": [],
370+
"source": [
371+
"# Use delete to clear the index and remove it completely\n",
372+
"router.delete()"
373+
]
374+
}
375+
],
376+
"metadata": {
377+
"kernelspec": {
378+
"display_name": "rvl",
379+
"language": "python",
380+
"name": "python3"
381+
},
382+
"language_info": {
383+
"codemirror_mode": {
384+
"name": "ipython",
385+
"version": 3
386+
},
387+
"file_extension": ".py",
388+
"mimetype": "text/x-python",
389+
"name": "python",
390+
"nbconvert_exporter": "python",
391+
"pygments_lexer": "ipython3",
392+
"version": "3.10.14"
393+
},
394+
"orig_nbformat": 4
395+
},
396+
"nbformat": 4,
397+
"nbformat_minor": 2
398+
}

0 commit comments

Comments
 (0)