Reinforcement learning — Definition & Role in Humanoid Robotics

Reinforcement learning

Also known as: RL

In brief

Reinforcement learning trains a policy through trial and error against a reward function. The agent acts, receives reward, and updates its policy to maximize expected return. In humanoids, RL is most common for locomotion and balance where rewards are easy to specify (don't fall, walk forward).

RL works best when reward is dense and easy to evaluate — a falling robot is unambiguously a failure. That makes RL a natural fit for locomotion, where simulation can run thousands of episodes per minute and the reward function "stay upright + go forward" is well-defined.

For manipulation and high-level task execution, RL struggles because reward shaping is hard ("did the robot fold the laundry correctly?" is ambiguous). Most humanoid manipulation work in 2025–2026 uses imitation learning or hybrid approaches. RL still dominates the locomotion controllers underneath those higher-level policies.

See it in the wild

Browse robots and brands using these techniques

Glossary entries are upstream. The catalog is where the implementations live.

Open catalog All terms