This example shows how to run embedding searches using OpenAI and Pgai.
Create a Python virtual environment and install the required libraries:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file and configure the following environment variables:
OPENAI_API_KEY=[YOUR OPENAI API KEY]
DATABASE=[YOUR POSTGRESQL CONNECTION STRING]
Run the following command to export both environment variables:
export $(cat .env | xargs)
Connect to your PostgreSQL instance using the following command:
PGOPTIONS="-c ai.openai_api_key=$OPENAI_API_KEY" psql -d $DATABASE
Once connected to your database, create the pgai extension by running:
create extension if not exists ai cascade;Create a table to store a list of news articles. We'll use the embedding column to generate embeddings for each of the articles:
create table cnn_daily_mail(
id bigint not null primary key generated by default as identity,
highlights text,
article text,
embedding vector(1536),
tsv tsvector generated always as (to_tsvector('english', article)) stored
);Finally, create an index on the tsv column to enable full-text search:
create index on cnn_daily_mail using gin (tsv);After setting up the database, let's load some sample data into it by running the load_db.py file:
python load_db.pyYou can make sure the data was properly loaded by running the following query on your database:
select count(*) from cnn_daily_mail;At this point, we are ready to generate embeddings for every one of the news articles in the database. This command will take a few minutes to finish:
update cnn_daily_mail
set embedding = ai.openai_embed('text-embedding-ada-002', article);The final step is to index the embedding column using the pgvector HNSW (Hierarchical Navigable Small World) algorithm, which is efficient for nearest-neighbor searches in high-dimensional spaces.
create index on cnn_daily_mail using hnsw (embedding vector_cosine_ops);At this point, you are ready to start using the web interface to run full text and embedding searches. You can run the web interface by executing the devserver.sh script.