Setup environment:
conda create -n osgym python=3.10Instal libGL:
sudo apt-get update
sudo apt-get install libgl1 libglx-mesa0Install required Linux headers:
sudo apt-get install linux-headers-$(uname -r)Install essential building tools:
sudo apt-get install python3-dev build-essentialThen install the dependencies:
pip install -r requirements.txtInstall Docker
Setup Docker apt repository:
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get updateInstall Docker:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginVerify Installation:
sudo docker run hello-worldLaunch server:
./start_workers.shClean up server:
./clean.shLaunch server locally:
./start_workers.sh --localBenchmark speed:
cd examples
python test_osgym.py- Description: Initializes a new environment with a given task configuration.
- Request Body:
{ "task_config": { ... }, // Task configuration JSON "timeout": 1000 // Timeout in seconds } - Response:
{ "screenshot": "<base64-encoded image>", "problem": "<task instruction>", "vm_id": <int> }
- Description: Executes an action in the environment.
- Request Body:
{ "action": "<action string>", "vm_id": <int> } - Response:
{ "screenshot": "<base64-encoded image>", "is_finish": <bool>, "reward": <float> }
- Description: Shuts down and releases a VM.
- Request Body:
{ "vm_id": <int or 'all'> } - Response:
{ "vm_id": <int or 'all'> }
- Description: Gets a screenshot of the current VM state.
- Query Parameters:
vmId: VM ID (integer)
- Response:
{ "screenshot": "<base64-encoded image>", "vm_id": <int> }
Reset an environment:
curl -X POST http://localhost:20000/reset \
-H "Content-Type: application/json" \
-d '{"task_config": {...}, "timeout": 1000}'Step in the environment:
curl -X POST http://localhost:20000/step \
-H "Content-Type: application/json" \
-d '{"action": "<|think_start|><|think_end|><|action_start|>click(100,100)<|action_end|>", "vm_id": 0}'Shutdown a VM:
curl -X POST http://localhost:20000/shutdown \
-H "Content-Type: application/json" \
-d '{"vm_id": 0}'- Authentication: The API restricts access by IP. Set the
OSGYM_ALLOWED_IPSenvironment variable to allow your client IPs. - VM IDs: VM IDs are managed by the server. Use the
vm_idreturned by/resetfor subsequent calls. - Screenshots: All screenshots are returned as base64-encoded PNG images.
OSGym is designed for distributed, scalable RL/data collection. Each worker runs an environment server (FastAPI) on a different port, as specified in config.yaml.
To start all workers (one per port in config.yaml):
./start_workers.sh- By default, this launches on all interfaces (
0.0.0.0). - For local-only testing, use:
./start_workers.sh --local
Each worker runs a FastAPI server (see API Reference above) and manages its own set of VMs.
To stop all workers and clean up Docker containers and processes:
./clean.shOSGym provides a data server and dataloader for efficient RL training and experience replay.
The data server (see examples/data_server.py) manages:
- ReplayBuffer: Stores and samples experience tuples for RL.
- MultiTurnDataloader: Handles parallel environment rollout, batching, and preprocessing for RL agents.
-
ReplayBuffer
- Stores (state, action, reward, next_state, done) tuples.
- Supports discounted reward calculation and outcome-only storage.
- Sampling:
ReplayBuffer.sample(batch_size)
-
MultiTurnDataloader
- Launches multiple environment workers in parallel (using multiprocessing).
- Collects rollouts, preprocesses data (tokenization, multimodal support), and batches for training.
- Example usage:
from examples.data_server import MultiTurnDataloader dataloader = MultiTurnDataloader( env_class=YourEnvClass, env_configs=[...], # List of env configs (one per worker) tokenizer=your_tokenizer, processor=your_processor, # Optional, for multimodal batch_size=8, ... ) for batch in dataloader: # batch is a dict of tensors ready for model input ...
batch = dataloader.sample_from_buffer(batch_size=32)
# batch is a dict with keys: input_ids, attention_mask, position_ids, responses, reward, etc.dataloader.print_stats_in_replay_buffer()Each worker uses a desktop environment server (Flask, see desktop_env/server/main.py) to interact with the OS.
Some useful endpoints (for advanced users):
| Endpoint | Method | Description |
|---|---|---|
/screenshot |
GET | Capture current screen (with cursor) |
/terminal |
GET | Get terminal output (Linux only) |
/accessibility |
GET | Get accessibility tree (A11y) |
/execute |
POST | Execute a shell command |
/list_directory |
POST | List directory contents |
/file |
POST | Download a file |
These are used internally by OSGym, but can be accessed directly for debugging or custom automation.
- Start workers:
./start_workers.sh - Interact via API: Use
/reset,/step,/shutdownas described above. - Collect data: Use the
MultiTurnDataloaderandReplayBufferfor RL training. - Benchmark: Run
python examples/test_osgym.pyto test and benchmark the system.
MIT License. Free for research and commercial use.
Some part of the code were borrowed from OSWorld. Thanks for their open-source contribution!