This project implements a conversational AI agent capable of interacting with an Airtable base to manage announcements and analyzing PDF document attachments using OpenAI's GPT-4o vision capabilities. The agent can fetch all announcements, search for specific announcements, retrieve attachments, convert PDF attachments to images, and then perform analysis tasks like summarization, action item extraction, and sentiment analysis on the document content.
The project is organized into the following key Python files within the /home/ubuntu/ai_agent_project/
directory:
main.py
: This script provides a command-line chat interface to interact with the AI agent. It handles user input and displays the agent's responses.agent_logic.py
: This file contains the core logic for the LangChain agent. It defines the tools, initializes the language model (LLM), sets up the agent executor, and manages the conversational flow and tool routing.airtable_tool.py
: This module implements theAirtableTool
class, which provides functionalities to connect to an Airtable base, fetch all records, search records based on text, retrieve attachment URLs, and download attachments.openai_analysis_tool.py
: This module contains theOpenAIDocumentAnalysisTool
class. It is responsible for converting PDF document pages into images (base64 encoded) and then using the OpenAI API (GPT-4o) to analyze the content of these images based on specified analysis types (e.g., summarize, extract action items, sentiment).venv/
: This directory contains the Python virtual environment with all the necessary dependencies.
- Python 3.11
poppler-utils
: This system dependency is required by thepdf2image
library for PDF processing. It was installed during the environment setup. If running elsewhere, ensure it's installed (e.g.,sudo apt-get install poppler-utils
on Debian/Ubuntu).
-
Navigate to the project directory:
cd /home/ubuntu/ai_agent_project
-
Activate the virtual environment: The virtual environment
venv
should already be set up in the project directory. To activate it:source venv/bin/activate
-
Install Dependencies: All required Python packages are listed in
requirements.txt
(which will be created in the next step). If setting up manually or ifrequirements.txt
is not used, the core packages were installed during the initial setup. You can ensure they are present by running:pip install langchain langchain-openai airtable-python-wrapper requests Pillow pdf2image openai fpdf2 python-dotenv
As per your request, the following API keys have been hardcoded directly into the respective Python files:
- Airtable API Key, Base ID, and Table Name: Located at the beginning of
airtable_tool.py
.AIRTABLE_API_KEY
AIRTABLE_BASE_ID
AIRTABLE_TABLE_NAME
(set to "Announcements")
- OpenAI API Key: Located at the beginning of
openai_analysis_tool.py
and also set as an environment variable withinagent_logic.py
for LangChain'sChatOpenAI
.OPENAI_API_KEY
Security Note: While hardcoding keys was done as requested for this development context, for production environments or shared code, it is strongly recommended to use environment variables (e.g., via a .env
file and the python-dotenv
library, or system environment variables) to manage sensitive credentials. This prevents accidental exposure.
To start interacting with the AI agent:
- Ensure you are in the project directory (
/home/ubuntu/ai_agent_project/
). - Ensure the virtual environment is activated (
source venv/bin/activate
). - Run the
main.py
script:python main.py
This will launch the command-line chat interface. You can then type your queries, such as:
- "Show all announcements"
- "Search announcements for Q2 report"
- "Summarize the latest announcement attachment"
- "What are the action items in the attachment of the announcement titled 'Project Phoenix Update'?"
Type quit
or exit
to terminate the chat interface.
- Accepts user queries via the command line.
- Maintains a chat history for conversational context.
- Passes queries and history to the
agent_executor
fromagent_logic.py
. - Displays the agent's final response or any errors encountered.
- Connects to Airtable: Uses the hardcoded API Key, Base ID (
appLu7BlsSJ0MzwXt
), and Table Name (Announcements
). - Fetch All Announcements: Retrieves all records from the specified table.
- Search Announcements: Filters announcements based on a search term in the
Title
orDescription
fields (case-insensitive local filtering after fetching all records). - Get Attachment:
- Can find an announcement by its ID, a search term, or by requesting the latest (based on
SentTime
field, assuming it exists and is sortable). - Retrieves the URL and filename of the first attachment found in the
Attachments
field of the target announcement. - Downloads the attachment to a local directory (
/tmp/agent_downloads/
).
- Can find an announcement by its ID, a search term, or by requesting the latest (based on
- Airtable Columns: The tool expects your "Announcements" table to have at least the following columns for full functionality:
Title
(Text)Description
(Text)Attachments
(Attachment type, for PDF files)SentTime
(Date/Time, used for 'latest' functionality)AnnouncementId
(Formula or Auto-number, if you want to refer to it directly, though the tool primarily uses Airtable's internal record IDs)DocumentsCount
(Number, this was mentioned but not directly used by the current tool logic, but good to be aware of).
- PDF to Image Conversion:
- Takes a local PDF file path as input.
- Uses
pdf2image
(which relies onpoppler-utils
) to convert each page of the PDF (up to a specified maximum, default 5 pages) into a PNG image. - Encodes these images into base64 strings.
- OpenAI Analysis (GPT-4o):
- Sends the base64 encoded images along with a text prompt to the OpenAI GPT-4o model.
- Supports analysis types:
summarize
: Provides a summary of the document content.extract_action_items
: Identifies action items, deadlines, and responsible parties.sentiment
: Analyzes the overall sentiment of the document.custom
: Allows for a user-defined prompt for analysis.
- Returns the textual analysis from OpenAI.
- LangChain Framework: Utilizes LangChain for agent creation and tool management.
- Tool Definition: Wraps the functionalities of
AirtableTool
andOpenAIDocumentAnalysisTool
into LangChainTool
orStructuredTool
objects, making them available to the agent. - LLM: Uses
ChatOpenAI
with thegpt-4o
model. - Prompt Engineering: Employs a system prompt and message placeholders to guide the agent's behavior and maintain conversational history.
- Agent Executor: Manages the interaction between the user input, the LLM, and the available tools. It decides which tool to use (if any) based on the user's intent.
- Error Handling: The tools and agent logic include error handling to manage issues like failed API calls, missing files, or incorrect inputs, and aim to provide informative messages to the user.
langchain
: Framework for building LLM applications.langchain-openai
: OpenAI integration for LangChain.airtable-python-wrapper
: For interacting with the Airtable API.openai
: Official OpenAI Python client library.requests
: For making HTTP requests (used by Airtable tool for downloads).Pillow
: Python Imaging Library, used bypdf2image
.pdf2image
: For converting PDF files to images.fpdf2
: Used in test scripts to generate sample PDF files.python-dotenv
: Used in test scripts (though not for loading main app keys as they are hardcoded).
- The
openai_analysis_tool.py
script contains aif __name__ == "__main__":
block that tests PDF conversion and analysis. It creates asample_test.pdf
if one doesn't exist. - The
agent_logic.py
script also contains aif __name__ == "__main__":
block with various test queries to check the agent's responses to different scenarios, including error conditions.
This documentation should provide a good starting point for understanding and using the AI agent. Let me know if you have further questions after reviewing the code and this guide.