RL Meta-Learning with GNN

Introduction

This repository presents a meta-learning approach for reinforcement learning (RL) environments, leveraging Graph Neural Networks (GNNs) to enable dynamic adaptability. The project emphasizes multi-agent setups where agents collaboratively learn optimal policies, focusing on flexibility, shared information, and environment-aware strategies.

Overview

The project aims to equip RL agents with the ability to adapt to varying task difficulties and dynamic interactions using meta-learning techniques and GNNs. Agents interact in a simulated city-like grid, taking on distinct objectives and utilizing shared information for optimized decision-making.

Key Features

Meta-Learning: Dynamically balances success and failure rates to achieve a 50/50 outcome.
Graph Neural Networks: Models agent relationships, enabling enhanced real-time adaptability.
Multi-Agent Policies: Develops specialized strategies for distinct roles.
Dynamic Environment: Adjusts parameters like agent count and resources to ensure evolving difficulty.
Shared Policemen Policy: Unifies strategies across agents for improved coordination.

Architecture

Environment

The simulation involves a grid-based city environment where:

MrX: Operates as the target agent, focusing on evasion.
Policemen: Cooperatively work to track and capture MrX.
Difficulty Parameter: Modifies agent capabilities and resources to fine-tune task complexity.

Meta-Learning Framework

The outer loop adjusts task difficulty through:

Collecting and analyzing performance data from multiple episodes.
Balancing success and failure rates to maintain a stable learning environment.
Embedding difficulty adjustments as a learnable parameter directly into the environment.

GNN Integration

GNNs enhance the system by:

Spatial and Temporal Encoding: Capturing dynamic relationships among agents.
State Sharing: Facilitating coordinated strategies across multiple agents.
Policy Adaptability: Supporting flexible decision-making through graph-based message passing.

Policies

MrX Policy: Optimized to maximize evasion success.
Policemen Policy: Shared across agents to promote efficient collaboration and coordination.

Code Structure Overview

main.py
- Contains the main entry point (train and evaluate functions) and the central training loop.
- Sets up the command-line arguments, loads configurations, initializes the logger, environment, and agents.
- Implements the logic for either training or evaluating the RL agents based on arguments.
logger.py
- Defines the Logger class for handling logging to console, file, TensorBoard, and Weights & Biases.
- Manages logging metrics, weights, and model artifacts.
Enviroment/base_env.py
- Declares an abstract base class (BaseEnvironment) for custom environments using PettingZoo’s ParallelEnv.
Enviroment/graph_layout.py
- Contains a custom ConnectedGraph class for creating random connected graphs with optional extra edges and weights.
- Provides graph sampling logic (e.g., Prim’s algorithm to ensure connectivity).
Enviroment/yard.py
- Implements CustomEnvironment, which inherits from BaseEnvironment.
- Manages environment reset, step logic, agent positions, reward calculations, rendering, and graph observations.
RLAgent/base_agent.py
- Declares an abstract BaseAgent class defining the interface (select_action, update, etc.) for all RL agents.
RLAgent/gnn_agent.py
- Defines GNNAgent, a DQN-like agent using a GNN (GNNModel) to compute Q-values for graph nodes.
- Handles experience replay, epsilon-greedy action selection, and network updates.

Main Training Loop (in main.py, train function)

Initialize logger, network(s), optimizers, and hyperparameters.
For each epoch:
- Randomly choose environment config (number of agents, money, etc.).
- Forward pass through the RewardWeightNet to compute reward weights for the environment.
- Inside loop: for each episode:
  - Reset environment, get initial state.
  - While not done:
    - Build GNN input (create_graph_data), pick actions (MrX and Police) using the GNN agents.
    - env.step(actions), compute rewards/terminations, update agents.
- Evaluate performance (num_eval_episodes), compute target difficulty, backpropagate loss in RewardWeightNet.
- Log metrics and proceed to the next epoch.

Installation

Clone the repository and install the required dependencies using apptainer:

git clone https://github.com/elte-collective-intelligence/Mechanism-Design.git
cd Mechanism-Design
./build.sh

This should build the apptainer image.

Usage

Wandb config

If you want to use wandb to log your experiments, dont forget to set the enviromental variables:

WANDB_PROJECT
WANDB_ENTITY
WANDB_API_KEY

Experiment config

In the experiment folder, create a folder with the name of you experiment.
Add a config.yml file to it, with the required configurations (there are examples)

Run one experiment

Start the training with one experiment:

./run.sh name-of-experiment

Run all experiment

Start the training with every experiments defined in the experiment folder:

./run_all_experiments.sh

Visualization

If you want to evaluate the policies, with visualized graphs add the

evaluate=True

Contributing

We welcome contributions! To contribute:

Fork the repository.
Create a feature branch.
Submit a pull request with detailed descriptions of changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL Meta-Learning with GNN

Introduction

Overview

Key Features

Architecture

Environment

Meta-Learning Framework

GNN Integration

Policies

Code Structure Overview

Main Training Loop (in main.py, train function)

Installation

Usage

Wandb config

Experiment config

Run one experiment

Run all experiment

Visualization

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Enviroment		Enviroment
RLAgent		RLAgent
artifacts		artifacts
experiments		experiments
.gitignore		.gitignore
README.md		README.md
Singularity.def		Singularity.def
build.sh		build.sh
environment.yml		environment.yml
logger.py		logger.py
main.py		main.py
run.sh		run.sh
run_all_experiments.sh		run_all_experiments.sh

elte-collective-intelligence/student-mechanism-design

Folders and files

Latest commit

History

Repository files navigation

RL Meta-Learning with GNN

Introduction

Overview

Key Features

Architecture

Environment

Meta-Learning Framework

GNN Integration

Policies

Code Structure Overview

Main Training Loop (in main.py, train function)

Installation

Usage

Wandb config

Experiment config

Run one experiment

Run all experiment

Visualization

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages