Multi-Agent Reinforcement Learning (MARL) at a Glance

Over the years, a variety of multi-agent reinforcement learning approaches have emerged to tackle the complexities of agents learning together—often with partial observability, shared rewards, or adversarial dynamics. Below is a condensed summary of MARL methods —each contributing novel ways for agents to communicate, coordinate, and learn effectively.

Algorithm	Communication	Architecture	Differentiability	Setting	Key Environment(s)	Key Novelty / Notes
(RIAL)	Discrete messages treated as part of action	Deep Q-Network (Per-agent)	Not fully differentiable across agents	Cooperative tasks (partial observability)	Simple grid-based tasks, mixed matrix tasks	Early approach. Each agent sends/receives discrete actions as “messages,” but no direct gradient flow between them.
(DIAL)	Continuous (during training) but discretized	Deep Q-Network variant with shared parameters	Partially differentiable (DRU mechanism)	Cooperative tasks (partial observability)	Grid-world style tasks (Switch riddle, etc.)	Introduced Differentiable Inter-Agent Learning with a “Discretize-Regularize” Unit for test-time discrete signals.
(CommNet)	Continuous broadcast (averaged hidden states)	Single feed-forward or recurrent net (modular)	Fully differentiable end to end	Primarily cooperative, fully or partially obs.	Predator-prey, multi-robot simulation, etc.	Symmetrical broadcast channel; each agent’s hidden state is summed/averaged to form communication. Scales well.
(BiCNet)	Continuous bidirectional RNN communication	Multi-agent actor-critic with parameter sharing	Fully differentiable (actor-critic)	Cooperative or competitive (StarCraft setting)	StarCraft micromanagement, complex envs	Uses a bidirectional RNN for richer comms; can handle heterogeneous units. Emergent “human-like” tactics in StarCraft.

Quick Observations

Communication Mechanisms:
- RIAL used discrete messages as if they were extra actions.
- DIAL allowed continuous feedback loops in training but discretized signals at test time.
- CommNet went fully continuous, summing or averaging hidden states for symmetrical broadcast.
- BiCNet used bidirectional continuous channels via RNN, allowing more nuanced message flow.
Architectural Differences:
- RIAL and DIAL built upon Q-learning for each agent, albeit with parameter sharing.
- CommNet used a single large network with repeated modules.
- BiCNet combined actor-critic training with bidirectional RNN communication.
Differentiability:
- RIAL had no direct gradient flow across agents (since messages were discrete).
- DIAL introduced partial differentiability (via DRU) but discretized signals at inference.
- CommNet and BiCNet offered fully differentiable communication channels, which speeds up coordination learning.
Emergent Coordination:
- From simpler tasks (like switch-based puzzles) to StarCraft micromanagement, these algorithms show how communication can lead to advanced team strategies like focus fire, hit-and-run, and multi-robot navigation.

More are being updated in this page

🪴 Learn without Regrets

Explorer

Summary

Multi-Agent Reinforcement Learning (MARL) at a Glance

Quick Observations

Graph View

Table of Contents