ELLMA-T is an Embodied Conversational Agent(ECA) that leverages an LLM (GPT-4) and situated learning framework for supporting learning English language in social VR (VRChat).
- Windows is recommended. Mac and Linux might work but no guarantees.
- Recommended to use conda/venv with Python 3.10
- Install torch with CUDA support.
- Install Visual C++ Build Tools
- Install Visual C++ Redistrubutable
- Install ffmpeg
ffmpegin conda (and pip?) is just binaries, installffmpeg-python(it is included inrequirements.txt)
- Install requirements with
pip install -r requirements.txt - For VRChat simulation module, you will need 2 computers (for 2 instances of VRChat)
There are two modules of ELLMA-T in this repo:
-
Text-based simulation (legacy): It is a command line interface that allows you to chat with the agent in a text-based environment, with limited audio speech to text support. It is recommended to use the text mode in VRChat simulation module.
-
VRChat simulation: It is still a command line interface, but in addition to text based conversations,can be used in sync with VRChat to do TTS and STT.
-
Get
.envfrom TA and put it in./textBasedSimulation. This file includes the shared OpenAI key, MongoDB, and other API keys. Please keep this file private and DO NOT PUSH IT. -
To run:
cd ./textBasedSimulation python ./textBasedSimulation.py -
After each time program ends, CSV file is automatically generated about time usage will be generated In folder
textBasedSimulation/evaluations/TestScenarios_CSV. -
Our Agent is uses MongoDB as dataset to store memory and observations. You can connect to DB directly use connection string in
.envfile. Contact Teaching Assistant if you’d like to be added as admin of DB.
- Put .env in
./SimulationSystem-VRCHAT - To run:
cd ./SimulationSystem-VRCHAT python ./VRfinalVersion.py - Then select
1. Text Mode - Follow the instructions in the terminal to set up character and conversation.
- Enable stereo mix in your system.
- For Windows 10: Follow this tutorial
- For Windows 11: View this YouTube guide
- Pay attention on if the recording device is set to mute by default (Windows sometimes do this).

- Install VBCable. If you are from Northeastern contact TA to get access to it.
- Download VRChat from Steam and log in with your VRChat account (not Steam account).
- In VRChat, press
Rto open menu,Options->OSC->Enable - In VRChat, press
Escto open menu,Settings->Microphone->CABLE-C Output - In VRChat, turn off background music. (Or even maybe all other sounds except player voices)
- Log in to VRChat
- Run:
cd ./SimulationSystem-VRCHAT python ./VRfinalVersion.py - Then select
2. Audio Mode - Follow the instructions in the terminal to set up character and conversation.
- ELLMA should say a greeting message in VRChat (you might want to use a mirror inside VRChat to check if the message is displayed also)
- On another PC, log in to VRChat and navigate to the same world as ELLMA-T (headphones are recommended)
- ELLMA-T will be able to pick up sounds from the VRChat (just like a normal player) and responds with audio.
You can use either Stereo Mix or CABLE-D for capturing VRChat audio. When capturing VRChat audio, the volume must not be muted. This can introduce some issues of echo and loop in some setup (e.g. when you only have speakers). This is where CABLE-D can come in handy by pointing your audio output to a null sink (a virtual speaker that does not make any sound).
To use CABLE-D, use argument python ./VRfinalVersion.py --use_cable_d
Check windows audio device settings. Sometime the device can get muted.

WIP
- In VRChat Sim Audio mode, there will be multiple audio stream errors in the console, but it will not affect the audio playing.
- Fix audio channel errors from OpenAI TTS response
- Audio mode with 1 PC?
- Clean up old code
- v0.2.0: 27/09/2024: Forked
- v0.2.1: 24/10/2024: Automated audio device selection for VRChat simulation module. Removed AWS Polly reference. Readme updated with instructions.
- v0.2.2: 24/10/2024: Fixed OpenAI TTS output stream error. Cable-D/Stereo Mix selection via command line argument.
@misc{https://doi.org/10.48550/arxiv.2410.02406,
doi = {10.48550/ARXIV.2410.02406},
url = {https://arxiv.org/abs/2410.02406},
author = {Pan, Mengxu and Kitson, Alexandra and Wan, Hongyu and Prpa, Mirjana},
keywords = {Human-Computer Interaction (cs.HC), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR},
publisher = {arXiv},
year = {2024},
copyright = {Creative Commons Attribution 4.0 International}
}