🏆 1st Place at Qualcomm x Microsoft x Northeastern University On-Device AI Builders Hackathon
Introducing GuideSense — sensing obstacles, speaking solutions — your personal navigation companion.
GuideSense is a wheelchair navigation assistant that uses computer vision and voice control to provide real-time guidance and feedback. Our system achieves exceptional performance:
- YOLOv8n object detection: <40ms inference on Snapdragon X Elite CPU
- OpenAI Whisper voice interface: <10ms inference on NPU via Qualcomm AI Engine Direct SDK with ONNX Runtime QNN
- Real-Time Object Detection: Utilizes YOLO to detect objects in the environment and assess potential obstacles
- Audio Feedback: Provides concise audio feedback about surroundings, including object type and distance
- Voice Activation: Allows users to activate the system with voice commands ("Go" to start)
- Responsive Feedback: Interrupts ongoing audio to provide immediate updates about critical obstacles
- On-Device Processing: End-to-end processing with zero cloud dependency for privacy and minimal latency
- Real-Time Depth Estimation: Calculates precise distances based on YOLO
- Python 3.11
- Working webcam and microphone
- Pyenv for managing Python versions (optional but recommended)
- Install Qualcomm AI Engine Direct SDK
- Download Whisper-Base-En ONNX model
- Clone the Repository:
git clone https://github.com/Hackathon-Team-404/GuideSense.git
cd GuideSense
- Set Up Python Environment:
pyenv install 3.11.11
pyenv local 3.11.11
- Create a Virtual Environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install Dependencies:
pip install -r requirements.txt
- Set Up API Keys:
Create a .env
file in the root directory:
OPENAI_API_KEY=your_openai_api_key_here
XAI_API_KEY=your_grok_api_key
Note: For basic functionality without LLM features, you can use an empty key:
OPENAI_API_KEY=""
- Run the Application:
python main.py
-
Voice Commands:
- Say "Go" to activate the system
-
Quit the Application:
- Press 'q' in the video window to quit
recognition/
├── main.py # Main script to run the application
├── situation_analyzer.py # Analyzes detected objects and provides guidance
├── audio_feedback.py # Handles audio feedback and text-to-speech
├── voice_control.py # Manages voice commands for activation and stopping
├── object_detector.py # Implements YOLO-based object detection
├── requirements.txt # Lists all Python dependencies
└── .env # Contains API keys (excluded from version control)
Component | Performance | Hardware |
---|---|---|
YOLOv8n Object Detection | < 40ms inference time | Snapdragon X Elite CPU |
OpenAI Whisper | < 10ms inference | Qualcomm NPU via AI Engine Direct SDK with ONNX Runtime QNN |
System | End-to-end on-device processing | Zero cloud dependency |
- Integration with distance sensors (ultrasound, IR) for enhanced spatial awareness
- Implementation of SLAM (Simultaneous Localization and Mapping) for improved navigation
Name | |
---|---|
Tianyu Fang | |
Anson He | |
Dingyang Jin | |
Hao Wu | |
Harshil Chudasama |