Reward shaping — Definition & Role in Humanoid Robotics

Reward shaping

In brief

Reward shaping is the practice of designing the reward function that an RL agent optimizes. Bad shaping causes the agent to discover unintended exploits; good shaping is a tedious, project-specific craft. It is one of the main reasons RL is hard to apply outside well-bounded problems like locomotion.

In a well-shaped RL problem, every meaningful step of progress earns small reward and every failure costs reward, so the agent can learn from a smooth gradient signal. In a badly shaped one, the agent learns to exploit the reward function — gaming the simulation in ways the designer didn't anticipate (the famous "boat racing in circles for points" example).

For humanoid locomotion, reward shaping is well-trodden: stay upright, go forward, minimize energy. For manipulation and high-level tasks, shaping is harder, which is why imitation learning has displaced RL for those domains.

See it in the wild

Browse robots and brands using these techniques

Glossary entries are upstream. The catalog is where the implementations live.

Open catalog All terms