Skip to content

An LLM-powered robot control system that enables natural language interaction with KUKA industrial robots through voice commands or text input.

Notifications You must be signed in to change notification settings

Rayen023/RobotVoiceControl

Repository files navigation

Robot Voice Control

An LLM-powered robot control system that enables natural language interaction with industrial robots through voice commands or text input.

System Architecture

System Architecture

The system follows a 5-step pipeline architecture that processes human voice commands into precise robot actions:

  1. Voice Activity Detection (VAD): Captures and filters human speech from voice instructions.
  2. Speech-to-Text Transcription: Uses Gemini-Flash-2.5 for automatic speech recognition, converting filtered audio to text tokens.
  3. AI Agent Processing: The agent processes text input, maintains conversation memory, and generates appropriate robot tasks.
  4. Robot Control Interface: Python-based control tools execute tasks and provide real-time feedback to the agent.
  5. Text-to-Speech Response: Azure TTS converts the agent's responses back to speech for seamless human interaction.

This architecture enables bidirectional communication between human and robot through natural language, with persistent memory for context-aware conversations.

Citation

@article{KADRI2025106660,
title = {LLM-driven agent for speech-enabled control of industrial robots: A case study in snow-crab quality inspection},
journal = {Results in Engineering},
volume = {27},
pages = {106660},
year = {2025},
issn = {2590-1230},
doi = {https://doi.org/10.1016/j.rineng.2025.106660},
url = {https://www.sciencedirect.com/science/article/pii/S2590123025027276},
author = {Ibrahim Kadri and Sid Ahmed Selouani and Mohsen Ghribi and Rayen Ghali and Sabrina Mekhoukh},
keywords = {Large language models (LLMs), Voice interface, KUKA industrial robot, Human-robot interaction, Autonomous robotic planning, Computer vision},}

How to Use

Installation

  1. Clone the repository:
    git clone https://github.com/Rayen023/RobotVoiceControl.git
    cd RobotVoiceControl
  • Project Structure:
    RobotVoiceControl/
    ├── voice_agent.py                # Voice-controlled interface
    ├── text_agent.py                 # Text-based interface
    ├── src/
    │   ├── agent_common.py           # Shared agent configuration
    │   ├── agent_tools.py            # Robot control tools and functions
    │   ├── robot_control.py          # Core robot communication
    │   ├── tts.py                    # Text-to-speech implementation
    │   ├── simple_emotion_detector.py# Emotion recognition
    │   └── tech_doc.md               # RAG Technical documentation for 
    └── pyproject.toml                # Project dependencies
    
  1. Install dependencies using uv:

    uv sync
  2. Configure environment variables: Create a .env file in the root directory with the following API keys:

    # Azure Speech Services (for Text-to-Speech)
    AZURE_SUBSCRIPTION_KEY=your_azure_subscription_key
    AZURE_REGION=your_azure_region
    
    # Google GenAI (for Speech-to-Text)
    GOOGLE_API_KEY=your_google_api_key
    
    # OpenRouter (for the main AI agent)
    OPENROUTER_API_KEY=your_openrouter_api_key
    OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

Running the Application

Voice Control Mode:

python voice_agent.py

Text Control Mode:

python text_agent.py

LLM-Powered Interaction

  • Natural Language Processing: Understands robot commands in conversational language.
  • Context Awareness: Maintains conversation history and robot state.
  • Multi-modal Input: Supports both voice and text commands.
  • Command Translation: Converts natural language to robot instructions.
  • Error Recovery: Robust error handling and automatic retry mechanisms.
  • Modular Architecture: Easily integrates new tools and capabilities.
  • Real-time Feedback: Provides live position monitoring and status updates.

KRL (KUKA Robot Language) Integration

The system establishes communication with the robot using:

  • Primary Communication: Telnet connection for sending KRL (KUKA Robot Language) commands.
  • Monitoring: py-openshowvar library for reading robot variables and positions.
  • Vision Integration: HTTP/FTP communication with Cognex vision systems.

The system generates and executes KRL commands for:

  • Position Control: Cartesian and joint space movements (linear and point-to-point).
  • Joint Movements: Precise angular positioning with real-time feedback.
  • Pick and Place Operations: Automated object manipulation with gripper control.
  • Home Position Initialization: Safe startup and reference positioning.
  • Real-time Position Monitoring: Continuous status updates and position tracking.
  • Safety Monitoring: Real-time position feedback and collision avoidance.
  • Vision Integration: Object detection via Cognex cameras.

About

An LLM-powered robot control system that enables natural language interaction with KUKA industrial robots through voice commands or text input.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages