Skip to content

svpino/timescale

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search on PostgreSQL Using OpenAI and Pgai

This example shows how to run embedding searches using OpenAI and Pgai.

Getting Started

Create a Python virtual environment and install the required libraries:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create a .env file and configure the following environment variables:

OPENAI_API_KEY=[YOUR OPENAI API KEY]
DATABASE=[YOUR POSTGRESQL CONNECTION STRING]

Run the following command to export both environment variables:

export $(cat .env | xargs)

Connect to your PostgreSQL instance using the following command:

PGOPTIONS="-c ai.openai_api_key=$OPENAI_API_KEY" psql -d $DATABASE

Once connected to your database, create the pgai extension by running:

create extension if not exists ai cascade;

Create a table to store a list of news articles. We'll use the embedding column to generate embeddings for each of the articles:

create table cnn_daily_mail( 
    id bigint not null primary key generated by default as identity,
    highlights text, 
    article text,
    embedding vector(1536), 
    tsv tsvector generated always as (to_tsvector('english', article)) stored
);

Finally, create an index on the tsv column to enable full-text search:

create index on cnn_daily_mail using gin (tsv);

After setting up the database, let's load some sample data into it by running the load_db.py file:

python load_db.py

You can make sure the data was properly loaded by running the following query on your database:

select count(*) from cnn_daily_mail;

At this point, we are ready to generate embeddings for every one of the news articles in the database. This command will take a few minutes to finish:

update cnn_daily_mail 
set embedding = ai.openai_embed('text-embedding-ada-002', article);

The final step is to index the embedding column using the pgvector HNSW (Hierarchical Navigable Small World) algorithm, which is efficient for nearest-neighbor searches in high-dimensional spaces.

create index on cnn_daily_mail using hnsw (embedding vector_cosine_ops);

At this point, you are ready to start using the web interface to run full text and embedding searches. You can run the web interface by executing the devserver.sh script.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published