Search on PostgreSQL Using OpenAI and Pgai

This example shows how to run embedding searches using OpenAI and Pgai.

Getting Started

Create a Python virtual environment and install the required libraries:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create a .env file and configure the following environment variables:

OPENAI_API_KEY=[YOUR OPENAI API KEY]
DATABASE=[YOUR POSTGRESQL CONNECTION STRING]

Run the following command to export both environment variables:

export $(cat .env | xargs)

Connect to your PostgreSQL instance using the following command:

PGOPTIONS="-c ai.openai_api_key=$OPENAI_API_KEY" psql -d $DATABASE

Once connected to your database, create the pgai extension by running:

create extension if not exists ai cascade;

Create a table to store a list of news articles. We'll use the embedding column to generate embeddings for each of the articles:

create table cnn_daily_mail( 
    id bigint not null primary key generated by default as identity,
    highlights text, 
    article text,
    embedding vector(1536), 
    tsv tsvector generated always as (to_tsvector('english', article)) stored
);

Finally, create an index on the tsv column to enable full-text search:

create index on cnn_daily_mail using gin (tsv);

After setting up the database, let's load some sample data into it by running the load_db.py file:

python load_db.py

You can make sure the data was properly loaded by running the following query on your database:

select count(*) from cnn_daily_mail;

At this point, we are ready to generate embeddings for every one of the news articles in the database. This command will take a few minutes to finish:

update cnn_daily_mail 
set embedding = ai.openai_embed('text-embedding-ada-002', article);

The final step is to index the embedding column using the pgvector HNSW (Hierarchical Navigable Small World) algorithm, which is efficient for nearest-neighbor searches in high-dimensional spaces.

create index on cnn_daily_mail using hnsw (embedding vector_cosine_ops);

At this point, you are ready to start using the web interface to run full text and embedding searches. You can run the web interface by executing the devserver.sh script.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idx		.idx
.vscode		.vscode
src		src
.gitignore		.gitignore
README.md		README.md
devserver.sh		devserver.sh
load_db.py		load_db.py
main.py		main.py
requirements.txt		requirements.txt
script.sql		script.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Search on PostgreSQL Using OpenAI and Pgai

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Languages

svpino/timescale

Folders and files

Latest commit

History

Repository files navigation

Search on PostgreSQL Using OpenAI and Pgai

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages