What is reinforcement learning?

Reinforcement learning (RL) is learning through interaction. Instead of being told the right answer, an agent explores the world, takes actions, and gets feedback—rewards for good choices, penalties for bad ones. Over time, it learns which actions lead to the best outcomes.

Traditional RL works with “value functions” or “policy gradients”—mathematical ways to estimate which actions are good. But when the state space gets huge (like “every pixel in a game screen”), we use neural networks to approximate these functions. That’s why you hear about “deep reinforcement learning”—the neural network learns to estimate the value of states or directly output actions.

Think of training a dog: you don’t explain chess to it, you give treats when it does something right. Eventually it learns behaviors that maximize rewards. AlphaGo learned Go this way—playing millions of games against itself, learning which moves lead to winning. The neural network isn’t told the rules; it discovers them through sheer trial and error, guided by the scoreboard.