Robot Voice Control

An LLM-powered robot control system that enables natural language interaction with industrial robots through voice commands or text input.

System Architecture

The system follows a 5-step pipeline architecture that processes human voice commands into precise robot actions:

Voice Activity Detection (VAD): Captures and filters human speech from voice instructions.
Speech-to-Text Transcription: Uses Gemini-Flash-2.5 for automatic speech recognition, converting filtered audio to text tokens.
AI Agent Processing: The agent processes text input, maintains conversation memory, and generates appropriate robot tasks.
Robot Control Interface: Python-based control tools execute tasks and provide real-time feedback to the agent.
Text-to-Speech Response: Azure TTS converts the agent's responses back to speech for seamless human interaction.

This architecture enables bidirectional communication between human and robot through natural language, with persistent memory for context-aware conversations.

Citation

@article{KADRI2025106660,
title = {LLM-driven agent for speech-enabled control of industrial robots: A case study in snow-crab quality inspection},
journal = {Results in Engineering},
volume = {27},
pages = {106660},
year = {2025},
issn = {2590-1230},
doi = {https://doi.org/10.1016/j.rineng.2025.106660},
url = {https://www.sciencedirect.com/science/article/pii/S2590123025027276},
author = {Ibrahim Kadri and Sid Ahmed Selouani and Mohsen Ghribi and Rayen Ghali and Sabrina Mekhoukh},
keywords = {Large language models (LLMs), Voice interface, KUKA industrial robot, Human-robot interaction, Autonomous robotic planning, Computer vision},}

How to Use

Installation

Clone the repository:

git clone https://github.com/Rayen023/RobotVoiceControl.git
cd RobotVoiceControl

Project Structure:

RobotVoiceControl/
├── voice_agent.py                # Voice-controlled interface
├── text_agent.py                 # Text-based interface
├── src/
│   ├── agent_common.py           # Shared agent configuration
│   ├── agent_tools.py            # Robot control tools and functions
│   ├── robot_control.py          # Core robot communication
│   ├── tts.py                    # Text-to-speech implementation
│   ├── simple_emotion_detector.py# Emotion recognition
│   └── tech_doc.md               # RAG Technical documentation for 
└── pyproject.toml                # Project dependencies

Install dependencies using uv:
```
uv sync
```

Configure environment variables: Create a .env file in the root directory with the following API keys:

# Azure Speech Services (for Text-to-Speech)
AZURE_SUBSCRIPTION_KEY=your_azure_subscription_key
AZURE_REGION=your_azure_region

# Google GenAI (for Speech-to-Text)
GOOGLE_API_KEY=your_google_api_key

# OpenRouter (for the main AI agent)
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

Running the Application

Voice Control Mode:

python voice_agent.py

Text Control Mode:

python text_agent.py

LLM-Powered Interaction

Natural Language Processing: Understands robot commands in conversational language.
Context Awareness: Maintains conversation history and robot state.
Multi-modal Input: Supports both voice and text commands.
Command Translation: Converts natural language to robot instructions.
Error Recovery: Robust error handling and automatic retry mechanisms.
Modular Architecture: Easily integrates new tools and capabilities.
Real-time Feedback: Provides live position monitoring and status updates.

KRL (KUKA Robot Language) Integration

The system establishes communication with the robot using:

Primary Communication: Telnet connection for sending KRL (KUKA Robot Language) commands.
Monitoring: py-openshowvar library for reading robot variables and positions.
Vision Integration: HTTP/FTP communication with Cognex vision systems.

The system generates and executes KRL commands for:

Position Control: Cartesian and joint space movements (linear and point-to-point).
Joint Movements: Precise angular positioning with real-time feedback.
Pick and Place Operations: Automated object manipulation with gripper control.
Home Position Initialization: Safe startup and reference positioning.
Real-time Position Monitoring: Continuous status updates and position tracking.
Safety Monitoring: Real-time position feedback and collision avoidance.
Vision Integration: Object detection via Cognex cameras.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
backup		backup
docs/images		docs/images
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
text_agent.py		text_agent.py
uv.lock		uv.lock
voice_agent.py		voice_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robot Voice Control

System Architecture

Citation

How to Use

Installation

Running the Application

LLM-Powered Interaction

KRL (KUKA Robot Language) Integration

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Rayen023/RobotVoiceControl

Folders and files

Latest commit

History

Repository files navigation

Robot Voice Control

System Architecture

Citation

How to Use

Installation

Running the Application

LLM-Powered Interaction

KRL (KUKA Robot Language) Integration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages