Deep RL: Baselines, Actor-Critic and GAE
A central challenge in deep RL is the high variance in gradient estimates, leading to unstable training and poor sample efficiency. This blog post explores how baselines, actor-critic methods, and Generalised Advantage Estimation (GAE) tackle this problem.
A surprising result underpins these methods: we can subtract arbitrary baselines from returns without biasing gradient estimates—yet this modification may dramatically reduce variance. We’ll progress from simple constant baselines through to state-dependent baselines (actor-critic), culminating in GAE, which allows precise control of the bias-variance trade-off. Along the way, we’ll examine the effects and guarantees of each method with respect to bias and variance.