Multi-Agent Reinforcement Learning (MARL) at a Glance

Over the years, a variety of multi-agent reinforcement learning approaches have emerged to tackle the complexities of agents learning together—often with partial observability, shared rewards, or adversarial dynamics. Below is a condensed summary of MARL methods —each contributing novel ways for agents to communicate, coordinate, and learn effectively.

AlgorithmCommunicationArchitectureDifferentiabilitySettingKey Environment(s)Key Novelty / Notes
(RIAL)Discrete messages treated as part of actionDeep Q-Network (Per-agent)Not fully differentiable across agentsCooperative tasks (partial observability)Simple grid-based tasks, mixed matrix tasksEarly approach. Each agent sends/receives discrete actions as “messages,” but no direct gradient flow between them.
(DIAL)Continuous (during training) but discretizedDeep Q-Network variant with shared parametersPartially differentiable (DRU mechanism)Cooperative tasks (partial observability)Grid-world style tasks (Switch riddle, etc.)Introduced Differentiable Inter-Agent Learning with a “Discretize-Regularize” Unit for test-time discrete signals.
(CommNet)Continuous broadcast (averaged hidden states)Single feed-forward or recurrent net (modular)Fully differentiable end to endPrimarily cooperative, fully or partially obs.Predator-prey, multi-robot simulation, etc.Symmetrical broadcast channel; each agent’s hidden state is summed/averaged to form communication. Scales well.
(BiCNet)Continuous bidirectional RNN communicationMulti-agent actor-critic with parameter sharingFully differentiable (actor-critic)Cooperative or competitive (StarCraft setting)StarCraft micromanagement, complex envsUses a bidirectional RNN for richer comms; can handle heterogeneous units. Emergent “human-like” tactics in StarCraft.

Quick Observations

  1. Communication Mechanisms:

    • RIAL used discrete messages as if they were extra actions.
    • DIAL allowed continuous feedback loops in training but discretized signals at test time.
    • CommNet went fully continuous, summing or averaging hidden states for symmetrical broadcast.
    • BiCNet used bidirectional continuous channels via RNN, allowing more nuanced message flow.
  2. Architectural Differences:

    • RIAL and DIAL built upon Q-learning for each agent, albeit with parameter sharing.
    • CommNet used a single large network with repeated modules.
    • BiCNet combined actor-critic training with bidirectional RNN communication.
  3. Differentiability:

    • RIAL had no direct gradient flow across agents (since messages were discrete).
    • DIAL introduced partial differentiability (via DRU) but discretized signals at inference.
    • CommNet and BiCNet offered fully differentiable communication channels, which speeds up coordination learning.
  4. Emergent Coordination:

    • From simpler tasks (like switch-based puzzles) to StarCraft micromanagement, these algorithms show how communication can lead to advanced team strategies like focus fire, hit-and-run, and multi-robot navigation.

More are being updated in this page