Perform RAG Inference

This notebook demonstrates how to perform inference using RAG (Retrieval-Augmented Generation).

Note that linkml-store is a data-first framework, the main emphasis is not on AI or LLMs. However, it does support a pluggable Inference framework, and one of the integrations is a simple RAG-based inference engine.

For this notebook, we will be using the command line interface, but the same can be done programmatically using the Python API.

Loading the data into duckdb

[1]:
%%bash
mkdir -p tmp
rm -rf tmp/countries.ddb
linkml-store  -d duckdb:///tmp/countries.ddb -c countries insert ../../tests/input/countries/countries.jsonl
Inserted 20 objects from ../../tests/input/countries/countries.jsonl into collection 'countries'.

Let’s check what this looks like by using describe and examining the first entry:

[2]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb describe
          count unique               top freq
capital      20     20  Washington, D.C.    1
code         20     20                US    1
continent    20      6            Europe    5
languages    20     15         [English]    4
name         20     20     United States    1
[3]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb query --limit 1 -O yaml
name: United States
code: US
capital: Washington, D.C.
continent: North America
languages:
- English

First we will check we don’t already have the country we will use for testing in the database (the countries.jsonl file is intentionally incomplete)

[ ]:

[4]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries query -w "name: Uruguay"
[]

Inferring a specific field

[18]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -T languages -q "name: Uruguay"
predicted_object:
  languages:
  - Spanish

The RAG engine works by first indexing the countries collection by embedding each entry. The top N results matching the query are fetched and used as context for the LLM query.

Note that in this particular case, we have a very small collection of twenty entries, and it’s not even necessary to perform RAG at all, as the entire collection can easily fit within the context window of the LLM query. However, this small set is useful for demo purposes.

Inferring a whole object

[6]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay"
predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Inferring from multiple fields

[27]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag  -q "{continent: South America, languages: [Dutch]}"
predicted_object:
  capital: Paramaribo
  code: SR
  name: Suriname

RAG configuration - using a different model

The datasette llm framework is used under the hood. This means that you can use the llm command to list the available models and configurations, as well as install new ones.

[28]:
%%bash
llm models
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Chat: gpt-4-1106-preview
OpenAI Chat: gpt-4-0125-preview
OpenAI Chat: gpt-4-turbo-2024-04-09
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
OpenAI Chat: gpt-4-vision-preview (aliases: 4V, gpt-4-vision)
OpenAI Chat: litellm-mixtral
OpenAI Chat: litellm-llama3
OpenAI Chat: litellm-llama3-chatqa
OpenAI Chat: litellm-groq-mixtral
OpenAI Chat: litellm-groq-llama
OpenAI Chat: gpt-4o-2024-05-13 (aliases: 4o, gpt-4o)
OpenAI Chat: lbl/llama-3
OpenAI Chat: lbl/claude-opus
OpenAI Chat: lbl/claude-sonnet
OpenAI Chat: lbl/gpt-4o
OpenAI Chat: lbl/llama-3
Anthropic Messages: claude-3-opus-20240229 (aliases: claude-3-opus)
Anthropic Messages: claude-3-sonnet-20240229 (aliases: claude-3-sonnet)
Anthropic Messages: claude-3-haiku-20240307 (aliases: claude-3-haiku)
Anthropic Messages: claude-3-5-sonnet-20240620 (aliases: claude-3.5-sonnet)

We’ll try claude-3-haiku, a small model. This may not be powerful enough for extraction tasks, but general knowledge about countries should be within its capabilities.

[29]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag:llm_config.model_name=claude-3-haiku -q "name: Uruguay"
predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Persisting the RAG model

[44]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay" -E tmp/countries.rag.json
predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

[45]:
%%bash
ls -l tmp/countries.rag.json
-rw-r--r--  1 cjm  staff  498212 Aug 21 16:05 tmp/countries.rag.json
[46]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -q "name: Uruguay" -L tmp/countries.rag.json
predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Evaluation

[55]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag  -T languages -T code -F name -n 5
Outcome: true_positive_count=5.0 total_count=5 // accuracy: 1.0

How RAG indexing works under the hood

Behind the scenes, whenever you use the RAG inference engine, a separate collection is automatically created for a test dataset; additionally, an index is also created in the same database. This is true regardless of the database backend (DuckDB, MongoDB, etc.).

(note: if you are using an in-memory duckdb instance then the index is forgotten after each run, which could get expensive if you have a large collection).

Let’s examine our database to see the new collection and index. We will use the Jupyter SQL magic to query the database.

[ ]:

[38]:
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
[39]:
%%bash
cp tmp/countries.ddb tmp/countries-copy.ddb
[40]:
%sql duckdb:///tmp/countries-copy.ddb
[41]:
%%sql
SELECT * FROM information_schema.tables
[41]:
table_catalog table_schema table_name table_type self_referencing_column_name reference_generation user_defined_type_catalog user_defined_type_schema user_defined_type_name is_insertable_into is_typed commit_action TABLE_COMMENT
0 countries-copy main countries BASE TABLE None None None None None YES NO None None
1 countries-copy main countries__rag_train BASE TABLE None None None None None YES NO None None
2 countries-copy main internal__index__countries__rag_train__llm BASE TABLE None None None None None YES NO None None
[42]:
%%sql
select * from internal__index__countries__rag_train__llm limit 5
[42]:
name code capital continent languages __index__
0 Argentina AR Buenos Aires South America [Spanish] [-0.009016353, 0.02336632, 0.007532564, -0.008...
1 South Korea KR Seoul Asia [Korean] [3.8781454e-05, 0.013463534, 0.017664365, -0.0...
2 United States US Washington, D.C. North America [English] [-0.0077237985, 0.016569635, -0.0042663547, -0...
3 Nigeria NG Abuja Africa [English] [-0.0055540577, 0.0037728157, -0.003473751, -0...
4 India IN New Delhi Asia [Hindi, English] [-0.0031975685, 0.025214365, 0.002862445, 0.00...
[43]:
%%sql
select count(*) from internal__index__countries__rag_train__llm
[43]:
count_star()
0 14
[25]:
%%sql
select count(*) from countries
[25]:
count_star()
0 20

Configuring the training/test split

By default, the infer command will split your data in collection into a test and train set. This is useful for evaluation, but if you want to use the entire dataset, or you want to configure the split size, you can use --training-test-data-split (-S).

[16]:

[ ]:

[37]:
%%bash
linkml-store  -d duckdb:///tmp/countries.ddb -c countries infer -t rag -S 1.0 0.0 -q "name: Uruguay"
predicted_object:
  capital: Montevideo
  code: UY
  continent: South America
  languages:
  - Spanish

Extraction tasks

We can also use this engine for extraction tasks - this involves extracting structured data or knowledge from textual or unstructured data.

In fact, we don’t need any new capabilities here - extraction can just be seen as a special case of inference, where the feature set includes or is restricted to text, and the target set is the whole object.

We can demonstrate this with a simple zero-shot example:

[53]:
%%bash
echo '{text: I saw the cat sitting on the mat, subject: cat, predicate: sits-on, object: mat}' > tmp/extraction-examples.yaml
[54]:
%%bash
linkml-store -i tmp/extraction-examples.yaml infer -t rag -q "text: the Earth rotates around the Sun"
predicted_object:
  object: Sun
  predicate: rotates-around
  subject: Earth

[ ]: