This project was developed for a global hackathon organized by IBM, aimed at promoting the adoption of WatsonX.ai and WatsonX Assistant. The solution addresses a common challenge within companies: providing employees with clear information on vacation policies and regulations.
The goal is to leverage a Retrieval-Augmented Generation (RAG) model to efficiently answer questions related to vacation rules using company documents. The solution uses Pinecone as a vector database, WatsonX.ai for the Large Language Model (LLM) using LLama, and LangChain as the orchestrator.
- Data Ingestion: The solution starts by uploading company policy PDFs into Pinecone using Python. The PDFs are split into chunks, embedded using WatsonX Embeddings, and stored in the Pinecone vector database.
- RAG API: A Python API built with Flask and Flask-RESTx handles incoming queries, retrieves relevant documents from Pinecone, and uses WatsonX.ai's LLM to generate contextually accurate responses.
- Chatbot Interface: The API integrates with WatsonX Assistant V2 using Actions, providing an interactive web interface for users to ask questions and receive answers in real time.
Before running the project, ensure you have the following:
- Python 3.10+ installed.
- The following Python packages:
requests
flask
flask-restx
python-dotenv
pydantic
fitz
(PyMuPDF for PDF processing)langchain
langchain_pinecone
langchain_community
ibm_watsonx_ai
- A Pinecone account with an API key.
- Access to WatsonX.ai API.
- A WatsonX Assistant V2 instance.
git clone https://github.com/sergiogama/RAG-for-HR-using-watsonx-langchain-and-pinecone.git
cd ibm-rag-solution
Copy the sample environment file and adjust it with your credentials:
cp sample.env .env
Edit the .env
file:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENV=us-east1-gcp
INDEX_NAME=vacation
WATSONX_ACCESS_TOKEN=your_watsonx_access_token
WATSONX_PROJECT_ID=your_watsonx_project_id
WATSONX_API_KEY=your_watsonx_api_key
WATSONX_API_URL=https://us-south.ml.cloud.ibm.com
pip install -r requirements.txt
python upload_pdf.py
python app.py
The API will be available at http://localhost:8000
.
- Log in to WatsonX.ai and create an API key.
- Find your Project ID under
Projects -> Manage -> General -> Details
.
- Ensure the
openapi.json
file is up to date:Obs: You can use and test the file part of this repositorycurl http://localhost:8000/swagger.json -o watsonx-openapi.json
- Log in to WatsonX Assistant.
- Create a new assistant.
- Go to the Integrations tab of your assistant.
- Click on Build custom extension.
- Use the downloaded
openapi.json
file to create a custom extension namedRAG HR
.
- Go to Assistant settings -> Download/upload files.
- Upload the
watsonx-actions.zip
file (included in this repository).
- Go to Step 3 -> Edit extension.
- Configure the extesnsion and set the parameter, query to query_text.
- Use the Preview chat feature to test the assistant.
- If the actions do not work initially, refresh the chat and re-upload the actions.
- Method:
POST
- Payload:
{ "query": "How do I apply for vacation?" }
- Response:
{ "response": "You can apply for vacation by filling out the online request form available on the HR portal." }
curl -X POST http://localhost:8000/api/chat -H "Content-Type: application/json" -d '{"query": "What is WatsonX?"}'
ibm-rag-solution/
├── app.py # Main Flask API
├── upload_pdf.py # Script to load PDFs into Pinecone
├── watsonx-openapi.json # OpenAPI specification for WatsonX Assistant
├── watsonx-actions.json # Actions configuration for WatsonX Assistant V2
├── requirements.txt # Python dependencies
├── sample.env # Sample environment variables file
├── .env # Environment variables
└── data/ # Dataset in PDF files to be uploaded to Pinecone
- Ensure all API keys and environment variables are set correctly.
- Verify Pinecone and WatsonX services are accessible.
- Use
curl
andPostman
to test the API endpoints.
- Pinecone Documentation
- WatsonX.ai Documentation
- WatsonX Assistant Documentation
- LangChain Documentation
We welcome contributions! Open issues or submit pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to IBM for organizing this hackathon.
Good luck with the hackathon, and may your solution stand out! 🚀
### Explanation of Changes
1. **Added a detailed section** for integrating with **WatsonX Assistant V2** using Actions and custom extensions.
2. **Updated the project structure** to include the necessary files (`watsonx-openapi.json` and `watsonx-actions.json`).
3. **Included configuration steps** for authentication and setting up session variables.
Let me know if you need any further customization or adjustments! 😊