Glossary · AI & ML
Reinforcement learning
Also known as: RL
In brief
Reinforcement learning trains a policy through trial and error against a reward function. The agent acts, receives reward, and updates its policy to maximize expected return. In humanoids, RL is most common for locomotion and balance where rewards are easy to specify (don't fall, walk forward).
RL works best when reward is dense and easy to evaluate — a falling robot is unambiguously a failure. That makes RL a natural fit for locomotion, where simulation can run thousands of episodes per minute and the reward function "stay upright + go forward" is well-defined.
For manipulation and high-level task execution, RL struggles because reward shaping is hard ("did the robot fold the laundry correctly?" is ambiguous). Most humanoid manipulation work in 2025–2026 uses imitation learning or hybrid approaches. RL still dominates the locomotion controllers underneath those higher-level policies.
Related terms
See it in the wild
Browse robots and brands using these techniques
Glossary entries are upstream. The catalog is where the implementations live.