Multi-Agent Reinforcement Learning (MARL) at a Glance
Over the years, a variety of multi-agent reinforcement learning approaches have emerged to tackle the complexities of agents learning together—often with partial observability, shared rewards, or adversarial dynamics. Below is a condensed summary of MARL methods —each contributing novel ways for agents to communicate, coordinate, and learn effectively.
Algorithm | Communication | Architecture | Differentiability | Setting | Key Environment(s) | Key Novelty / Notes |
---|---|---|---|---|---|---|
(RIAL) | Discrete messages treated as part of action | Deep Q-Network (Per-agent) | Not fully differentiable across agents | Cooperative tasks (partial observability) | Simple grid-based tasks, mixed matrix tasks | Early approach. Each agent sends/receives discrete actions as “messages,” but no direct gradient flow between them. |
(DIAL) | Continuous (during training) but discretized | Deep Q-Network variant with shared parameters | Partially differentiable (DRU mechanism) | Cooperative tasks (partial observability) | Grid-world style tasks (Switch riddle, etc.) | Introduced Differentiable Inter-Agent Learning with a “Discretize-Regularize” Unit for test-time discrete signals. |
(CommNet) | Continuous broadcast (averaged hidden states) | Single feed-forward or recurrent net (modular) | Fully differentiable end to end | Primarily cooperative, fully or partially obs. | Predator-prey, multi-robot simulation, etc. | Symmetrical broadcast channel; each agent’s hidden state is summed/averaged to form communication. Scales well. |
(BiCNet) | Continuous bidirectional RNN communication | Multi-agent actor-critic with parameter sharing | Fully differentiable (actor-critic) | Cooperative or competitive (StarCraft setting) | StarCraft micromanagement, complex envs | Uses a bidirectional RNN for richer comms; can handle heterogeneous units. Emergent “human-like” tactics in StarCraft. |
Quick Observations
-
Communication Mechanisms:
- RIAL used discrete messages as if they were extra actions.
- DIAL allowed continuous feedback loops in training but discretized signals at test time.
- CommNet went fully continuous, summing or averaging hidden states for symmetrical broadcast.
- BiCNet used bidirectional continuous channels via RNN, allowing more nuanced message flow.
-
Architectural Differences:
- RIAL and DIAL built upon Q-learning for each agent, albeit with parameter sharing.
- CommNet used a single large network with repeated modules.
- BiCNet combined actor-critic training with bidirectional RNN communication.
-
Differentiability:
- RIAL had no direct gradient flow across agents (since messages were discrete).
- DIAL introduced partial differentiability (via DRU) but discretized signals at inference.
- CommNet and BiCNet offered fully differentiable communication channels, which speeds up coordination learning.
-
Emergent Coordination:
- From simpler tasks (like switch-based puzzles) to StarCraft micromanagement, these algorithms show how communication can lead to advanced team strategies like focus fire, hit-and-run, and multi-robot navigation.
More are being updated in this page