A comprehensive chatbot system built with Streamlit, LangChain, and Groq that can answer questions from uploaded documents and collect user information through conversational forms for appointment booking.
- Multi-format Support: Upload PDF, TXT, and DOCX files
- Intelligent Chunking: Documents are split into optimal chunks for better retrieval
- Vector Search: Uses Weviate for efficient similarity search
- Conversational Memory: Maintains context across conversations
- Source Citations: Shows relevant document sections for answers
- Natural Language Processing: Understands requests like "call me" or "book appointment"
- Conversational Forms: Step-by-step information collection
- Smart Date Extraction: Handles natural date inputs like "next Monday", "tomorrow"
- Input Validation: Real-time validation for emails, phone numbers, and names
- Confirmation Process: Users can review and confirm their information
- Agent-based Architecture: Uses LangChain agents for specialized tasks
- Multiple Tools: Date extraction, validation, form management, and search
- Context Awareness: Switches between document Q&A and form collection seamlessly
├── logs
├── uploaded_docs
├── vectors_store
├── app.py # Main Streamlit application
├── config.py # Configuration settings
├── document_processor.py # Document loading and vector store management
├── chatbot.py # Main chatbot logic
├── conversational_form.py # Form collection and management
├── tool_agents.py # Agent tools and integration
├── validators.py # Input validation and date extraction
├── requirements.txt # Python dependencies
└── README.md # This file
-
Clone or download all the files to a directory
-
Install dependencies:
pip install -r requirements.txt
-
Set up your Groq API key:
- Get your API key from Groq Console
- Either set it as an environment variable as creating .env and inside .env set this:
GROQ_API_KEY=your-api-key-here
-
Use docker to run Weviate:
- Start Weaviate locally:
- Use docker for this
docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
-
Run the application:
streamlit run app.py
Edit config.py
to customize:
- MODEL_NAME: Groq model to use (default: "llama-3.3-70b-versatile2")
- CHUNK_SIZE: Document chunk size (default: 1000)
- CHUNK_OVERLAP: Overlap between chunks (default: 200)
- TEMPERATURE: Model creativity (default: 0.3)
- MAX_TOKENs: (default: 4000)
- EMBEDDING_MODEL: Sentence transformer model (default: "sentence-transformers/all-mpnet-base-v2")
-
Upload Documents:
- Use the sidebar to upload PDF, TXT, or DOCX files
- Click "Process Documents" to create the vector store
- Documents are chunked and indexed automatically
-
Ask Questions:
- Type questions about your documents in the chat
- Get answers with relevant source excerpts
- Maintain conversation context
-
Switch back to appointment booking to document QA
- Clear chat as it has history saving it can answer based on previous chat
- for better performance
-
Trigger Form Collection:
- Say things like "call me", "book appointment", "schedule a meeting"
- The system will detect the intent and start form collection
-
Provide Information:
- Name: Full name validation
- Email: Email format validation
- Phone: phone number validation
- Date: Natural language date parsing (e.g., "next Monday", "tomorrow")
-
Confirmation:
- Review your information
- Confirm or request changes
- Receive confirmation message
The system understands various date formats:
- "tomorrow", "today", "yesterday"
- "next Monday", "next Friday"
- "next week", "next month"
- "2024-01-15", "01/15/2024"
- "January 15th", "Jan 15"
The system includes several specialized tools:
- schedule_appointment: Handles appointment booking workflow
- extract_date: Converts natural language to YYYY-MM-DD format
- get_form_status: Shows current form collection progress
- validate_contact_info: Validates emails, phones, and names
- search_appointments: Searches through completed appointments
- Email: Verifying Email with regex
- Phone: Character validation and length requirements of Phone numbers
- Name: Character validation and length requirements
- Date: Comprehensive natural language date parsing
- Context Switching: Seamlessly switches between document Q&A and form collection
- Memory Management: Maintains conversation history and form state
- Error Handling: Graceful error recovery with helpful messages
- Progress Tracking: Shows form completion progress
- Clean Design: Intuitive Streamlit interface
- Sidebar Organization: Document upload, form status, and help sections
- Chat History: Persistent conversation display
- Real-time Updates: Dynamic form progress and status updates
- No Persistent Storage: Forms are only stored in session state
- API Key Security: Keys are handled securely through environment variables
- Input Sanitization: All user inputs are validated and sanitized
- Error Boundaries: Comprehensive error handling prevents crashes
-
"Agent is not available"
- Check your Groq API key
- Ensure internet connectivity
- Verify model availability
-
Document processing fails
- Check file format (PDF, TXT, DOCX only)
- Ensure files are not corrupted
- Check file size limitations
-
Weviate Issue
- Check you weviate docker conatiner is running or not
- Document Size: Keep documents under 10MB for best performance
- Chunk Size: Adjust chunk size based on document complexity
- Memory Usage: Clear chat history periodically for long sessions
- API Limits: Monitor Groq API usage to avoid rate limits