Perform RAG Inference

This notebook demonstrates how to perform inference using RAG (Retrieval-Augmented Generation).

Note that linkml-store is a data-first framework, the main emphasis is not on AI or LLMs. However, it does support a pluggable Inference framework, and one of the integrations is a simple RAG-based inference engine.

For this notebook, we will be using the command line interface, but the same can be done programmatically using the Python API.

Loading the data into duckdb

[1]:

%%bash
mkdir -p tmp
rm -rf tmp/countries.ddb
linkml-store  -d duckdb:///tmp/countries.ddb -c countries insert ../../tests/input/countries/countries.jsonl

Inserted 20 objects from ../../tests/input/countries/countries.jsonl into collection 'countries'.

Let’s check what this looks like by using describe and examining the first entry:

[2]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb describe

          count unique               top freq
capital      20     20  Washington, D.C.    1
code         20     20                US    1
continent    20      6            Europe    5
languages    20     15         [English]    4
name         20     20     United States    1

[3]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb query --limit 1 -O yaml

name: United States
code: US
capital: Washington, D.C.
continent: North America
languages:
- English

First we will check we don’t already have the country we will use for testing in the database (the countries.jsonl file is intentionally incomplete)

[ ]:

[4]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries query -w "name: Uruguay"

[]

Inferring a specific field

[18]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -T languages -q "name: Uruguay"

predicted_object:
  languages:
  - Spanish

The RAG engine works by first indexing the countries collection by embedding each entry. The top N results matching the query are fetched and used as context for the LLM query.

Note that in this particular case, we have a very small collection of twenty entries, and it’s not even necessary to perform RAG at all, as the entire collection can easily fit within the context window of the LLM query. However, this small set is useful for demo purposes.

Inferring a whole object

[6]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay"

predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Inferring from multiple fields

[27]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag  -q "{continent: South America, languages: [Dutch]}"

predicted_object:
  capital: Paramaribo
  code: SR
  name: Suriname

RAG configuration - using a different model

The datasette llm framework is used under the hood. This means that you can use the llm command to list the available models and configurations, as well as install new ones.

[28]:

%%bash
llm models

OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Chat: gpt-4-1106-preview
OpenAI Chat: gpt-4-0125-preview
OpenAI Chat: gpt-4-turbo-2024-04-09
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
OpenAI Chat: gpt-4-vision-preview (aliases: 4V, gpt-4-vision)
OpenAI Chat: litellm-mixtral
OpenAI Chat: litellm-llama3
OpenAI Chat: litellm-llama3-chatqa
OpenAI Chat: litellm-groq-mixtral
OpenAI Chat: litellm-groq-llama
OpenAI Chat: gpt-4o-2024-05-13 (aliases: 4o, gpt-4o)
OpenAI Chat: lbl/llama-3
OpenAI Chat: lbl/claude-opus
OpenAI Chat: lbl/claude-sonnet
OpenAI Chat: lbl/gpt-4o
OpenAI Chat: lbl/llama-3
Anthropic Messages: claude-3-opus-20240229 (aliases: claude-3-opus)
Anthropic Messages: claude-3-sonnet-20240229 (aliases: claude-3-sonnet)
Anthropic Messages: claude-3-haiku-20240307 (aliases: claude-3-haiku)
Anthropic Messages: claude-3-5-sonnet-20240620 (aliases: claude-3.5-sonnet)

We’ll try claude-3-haiku, a small model. This may not be powerful enough for extraction tasks, but general knowledge about countries should be within its capabilities.

[29]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag:llm_config.model_name=claude-3-haiku -q "name: Uruguay"

predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Persisting the RAG model

[44]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay" -E tmp/countries.rag.json

predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

[45]:

%%bash
ls -l tmp/countries.rag.json

-rw-r--r--  1 cjm  staff  498212 Aug 21 16:05 tmp/countries.rag.json

[46]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay" -L tmp/countries.rag.json

predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Evaluation

[55]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag  -T languages -T code -F name -n 5

Outcome: true_positive_count=5.0 total_count=5 // accuracy: 1.0

How RAG indexing works under the hood

Behind the scenes, whenever you use the RAG inference engine, a separate collection is automatically created for a test dataset; additionally, an index is also created in the same database. This is true regardless of the database backend (DuckDB, MongoDB, etc.).

(note: if you are using an in-memory duckdb instance then the index is forgotten after each run, which could get expensive if you have a large collection).

Let’s examine our database to see the new collection and index. We will use the Jupyter SQL magic to query the database.

[ ]:

[38]:

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

[39]:

%%bash
cp tmp/countries.ddb tmp/countries-copy.ddb

[40]:

%sql duckdb:///tmp/countries-copy.ddb

[41]:

%%sql
SELECT * FROM information_schema.tables

[41]:

	table_catalog	table_schema	table_name	table_type	self_referencing_column_name	reference_generation	user_defined_type_catalog	user_defined_type_schema	user_defined_type_name	is_insertable_into	is_typed	commit_action	TABLE_COMMENT
0	countries-copy	main	countries	BASE TABLE	None	None	None	None	None	YES	NO	None	None
1	countries-copy	main	countries__rag_train	BASE TABLE	None	None	None	None	None	YES	NO	None	None
2	countries-copy	main	internal__index__countries__rag_train__llm	BASE TABLE	None	None	None	None	None	YES	NO	None	None

[42]:

%%sql
select * from internal__index__countries__rag_train__llm limit 5

[42]:

	name	code	capital	continent	languages	__index__
0	Argentina	AR	Buenos Aires	South America	[Spanish]	[-0.009016353, 0.02336632, 0.007532564, -0.008...
1	South Korea	KR	Seoul	Asia	[Korean]	[3.8781454e-05, 0.013463534, 0.017664365, -0.0...
2	United States	US	Washington, D.C.	North America	[English]	[-0.0077237985, 0.016569635, -0.0042663547, -0...
3	Nigeria	NG	Abuja	Africa	[English]	[-0.0055540577, 0.0037728157, -0.003473751, -0...
4	India	IN	New Delhi	Asia	[Hindi, English]	[-0.0031975685, 0.025214365, 0.002862445, 0.00...

[43]:

%%sql
select count(*) from internal__index__countries__rag_train__llm

[43]:

	count_star()
0	14

[25]:

%%sql
select count(*) from countries

[25]:

	count_star()
0	20

Configuring the training/test split

By default, the infer command will split your data in collection into a test and train set. This is useful for evaluation, but if you want to use the entire dataset, or you want to configure the split size, you can use --training-test-data-split (-S).

[16]:

[ ]:

[37]:

%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -S 1.0 0.0 -q "name: Uruguay"

predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Extraction tasks

We can also use this engine for extraction tasks - this involves extracting structured data or knowledge from textual or unstructured data.

In fact, we don’t need any new capabilities here - extraction can just be seen as a special case of inference, where the feature set includes or is restricted to text, and the target set is the whole object.

We can demonstrate this with a simple zero-shot example:

[53]:

%%bash
echo '{text: I saw the cat sitting on the mat, subject: cat, predicate: sits-on, object: mat}' > tmp/extraction-examples.yaml

[54]:

%%bash
linkml-store -i tmp/extraction-examples.yaml infer -t rag -q "text: the Earth rotates around the Sun"

predicted_object:
  object: Sun
  predicate: rotates-around
  subject: Earth

[ ]: