📚 RAG Solution for IBM Hackathon: Unlocking Knowledge with WatsonX.ai & Pinecone

🏆 Objective

This project was developed for a global hackathon organized by IBM, aimed at promoting the adoption of WatsonX.ai and WatsonX Assistant. The solution addresses a common challenge within companies: providing employees with clear information on vacation policies and regulations.

The goal is to leverage a Retrieval-Augmented Generation (RAG) model to efficiently answer questions related to vacation rules using company documents. The solution uses Pinecone as a vector database, WatsonX.ai for the Large Language Model (LLM) using LLama, and LangChain as the orchestrator.

🛠️ Proposed Solution

Data Ingestion: The solution starts by uploading company policy PDFs into Pinecone using Python. The PDFs are split into chunks, embedded using WatsonX Embeddings, and stored in the Pinecone vector database.
RAG API: A Python API built with Flask and Flask-RESTx handles incoming queries, retrieves relevant documents from Pinecone, and uses WatsonX.ai's LLM to generate contextually accurate responses.
Chatbot Interface: The API integrates with WatsonX Assistant V2 using Actions, providing an interactive web interface for users to ask questions and receive answers in real time.

🖥️ Technologies Used

⚙️ Prerequisites

Before running the project, ensure you have the following:

Python 3.10+ installed.
The following Python packages:
- requests
- flask
- flask-restx
- python-dotenv
- pydantic
- fitz (PyMuPDF for PDF processing)
- langchain
- langchain_pinecone
- langchain_community
- ibm_watsonx_ai
A Pinecone account with an API key.
Access to WatsonX.ai API.
A WatsonX Assistant V2 instance.

🛠️ Installation and Setup

1. Clone the Repository

git clone https://github.com/sergiogama/RAG-for-HR-using-watsonx-langchain-and-pinecone.git
cd ibm-rag-solution

2. Create a `.env` File

Copy the sample environment file and adjust it with your credentials:

cp sample.env .env

Edit the .env file:

PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENV=us-east1-gcp
INDEX_NAME=vacation
WATSONX_ACCESS_TOKEN=your_watsonx_access_token
WATSONX_PROJECT_ID=your_watsonx_project_id
WATSONX_API_KEY=your_watsonx_api_key
WATSONX_API_URL=https://us-south.ml.cloud.ibm.com

3. Install Dependencies

pip install -r requirements.txt

4. Load PDFs into Pinecone

python upload_pdf.py

🚀 Running the API

python app.py

The API will be available at http://localhost:8000.

🤖 Integrating with WatsonX Assistant V2

Step 1: Generate API Key and Project ID

Log in to WatsonX.ai and create an API key.
Find your Project ID under Projects -> Manage -> General -> Details.

Step 2: Download the OpenAPI Specification

Ensure the openapi.json file is up to date:
```
curl http://localhost:8000/swagger.json -o watsonx-openapi.json
```
Obs: You can use and test the file part of this repository

Step 3: Create a WatsonX Assistant

Log in to WatsonX Assistant.
Create a new assistant.

Step 4: Add a Custom Extension

Go to the Integrations tab of your assistant.
Click on Build custom extension.
Use the downloaded openapi.json file to create a custom extension named RAG HR.

Step 5: Upload WatsonX Assistant sample

Go to Assistant settings -> Download/upload files.
Upload the watsonx-actions.zip file (included in this repository).

Step 6: Configure action RAG HR search to use the extension

Go to Step 3 -> Edit extension.
Configure the extesnsion and set the parameter, query to query_text.

Step 7: Test the Assistant

Use the Preview chat feature to test the assistant.
If the actions do not work initially, refresh the chat and re-upload the actions.

📄 Usage

Endpoint: `/api/chat`

Method: POST

Payload:

{
  "query": "How do I apply for vacation?"
}

Response:

{
  "response": "You can apply for vacation by filling out the online request form available on the HR portal."
}

Example Request

curl -X POST http://localhost:8000/api/chat -H "Content-Type: application/json" -d '{"query": "What is WatsonX?"}'

🗂️ Project Structure

ibm-rag-solution/
├── app.py                  # Main Flask API
├── upload_pdf.py           # Script to load PDFs into Pinecone
├── watsonx-openapi.json    # OpenAPI specification for WatsonX Assistant
├── watsonx-actions.json    # Actions configuration for WatsonX Assistant V2
├── requirements.txt        # Python dependencies
├── sample.env              # Sample environment variables file
├── .env                    # Environment variables
└── data/                   # Dataset in PDF files to be uploaded to Pinecone

🛠️ Troubleshooting

Ensure all API keys and environment variables are set correctly.
Verify Pinecone and WatsonX services are accessible.
Use curl and Postman to test the API endpoints.

🔗 Useful Links

📢 Contributing

We welcome contributions! Open issues or submit pull requests.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

✨ Acknowledgments

Special thanks to IBM for organizing this hackathon.

Good luck with the hackathon, and may your solution stand out! 🚀


### Explanation of Changes
1. **Added a detailed section** for integrating with **WatsonX Assistant V2** using Actions and custom extensions.
2. **Updated the project structure** to include the necessary files (`watsonx-openapi.json` and `watsonx-actions.json`).
3. **Included configuration steps** for authentication and setting up session variables.

Let me know if you need any further customization or adjustments! 😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 RAG Solution for IBM Hackathon: Unlocking Knowledge with WatsonX.ai & Pinecone

🏆 Objective

🛠️ Proposed Solution

🖥️ Technologies Used

⚙️ Prerequisites

🛠️ Installation and Setup

1. Clone the Repository

2. Create a `.env` File

3. Install Dependencies

4. Load PDFs into Pinecone

🚀 Running the API

🤖 Integrating with WatsonX Assistant V2

Step 1: Generate API Key and Project ID

Step 2: Download the OpenAPI Specification

Step 3: Create a WatsonX Assistant

Step 4: Add a Custom Extension

Step 5: Upload WatsonX Assistant sample

Step 6: Configure action RAG HR search to use the extension

Step 7: Test the Assistant

📄 Usage

Endpoint: `/api/chat`

Example Request

🗂️ Project Structure

🛠️ Troubleshooting

🔗 Useful Links

📢 Contributing

📄 License

✨ Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
README.md		README.md
app.py		app.py
openapi.json		openapi.json
requirements.txt		requirements.txt
sample.env		sample.env
upload_pdf.py		upload_pdf.py
watsonx-actions.zip		watsonx-actions.zip

sergiogama/RAG-for-HR-using-watsonx-langchain-and-pinecone

Folders and files

Latest commit

History

Repository files navigation

📚 RAG Solution for IBM Hackathon: Unlocking Knowledge with WatsonX.ai & Pinecone

🏆 Objective

🛠️ Proposed Solution

🖥️ Technologies Used

⚙️ Prerequisites

🛠️ Installation and Setup

1. Clone the Repository

2. Create a .env File

3. Install Dependencies

4. Load PDFs into Pinecone

🚀 Running the API

🤖 Integrating with WatsonX Assistant V2

Step 1: Generate API Key and Project ID

Step 2: Download the OpenAPI Specification

Step 3: Create a WatsonX Assistant

Step 4: Add a Custom Extension

Step 5: Upload WatsonX Assistant sample

Step 6: Configure action RAG HR search to use the extension

Step 7: Test the Assistant

📄 Usage

Endpoint: /api/chat

Example Request

🗂️ Project Structure

🛠️ Troubleshooting

🔗 Useful Links

📢 Contributing

📄 License

✨ Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

2. Create a `.env` File

Endpoint: `/api/chat`

Packages