HumanoidHub · Glossary

Humanoid robotics, defined

39 terms across AI & ML, control, hardware, platforms, and business — written plain so you can scan fast, cite, and link from anywhere.

AI & ML

Models, learning techniques, and the AI side of the stack.

Diffusion policy

A diffusion policy is a robot control policy that uses a diffusion model — the same technique behind image generators like Stable Diffusion — to generate action sequences. The model learns to denoise random action noise into purposeful trajectories conditioned on observations.

Domain randomization

Domain randomization is the technique of training a policy across many randomized variants of the simulator — different physics parameters, lighting, textures, sensor noise — so the policy learns to be robust to the variation. It is the standard tool for closing the sim-to-real gap.

Embodied AI

Embodied AI is artificial intelligence that operates through and learns from a physical or simulated body. The category covers robots, virtual agents in physics simulators, and any AI whose inputs and outputs include sensorimotor experience rather than pure text.

Foundation model for robotics

A foundation model for robotics is a large, pre-trained neural network designed to be the base layer for many downstream robot tasks. Like a foundation language model, it is trained at scale and then fine-tuned per task or per platform.

Imitation learning

Imitation learning is training a robot policy by showing it human demonstrations rather than by reward shaping. The robot watches a teleoperator (or a human directly), then learns to reproduce the trajectory or behavior conditioned on observations.

Large behavior model

A large behavior model is a foundation-scale neural network trained to predict actions from observations across many tasks and embodiments. The robotics analogue of a large language model: pretrain on a wide corpus of robot data, then fine-tune for specific tasks.

Reinforcement learning

Reinforcement learning trains a policy through trial and error against a reward function. The agent acts, receives reward, and updates its policy to maximize expected return. In humanoids, RL is most common for locomotion and balance where rewards are easy to specify (don't fall, walk forward).

Reward shaping

Reward shaping is the practice of designing the reward function that an RL agent optimizes. Bad shaping causes the agent to discover unintended exploits; good shaping is a tedious, project-specific craft. It is one of the main reasons RL is hard to apply outside well-bounded problems like locomotion.

Sim-to-real transfer

Sim-to-real transfer is the technique of training a policy in simulation, then deploying it on a physical robot with minimal additional fine-tuning. Domain randomization — varying the simulator's physics, visuals, and sensor noise during training — is the standard tool for closing the sim-to-real gap.

VLA model

A vision-language-action (VLA) model is a single neural network that takes camera images plus a natural-language instruction and outputs motor commands. VLAs replace the older split between separate perception, planning, and control stacks with one end-to-end policy.

World model

A world model is a learned simulator of the environment used to plan or predict consequences of actions. Instead of (or in addition to) a real-world rollout, the agent imagines forward in its world model, picks the best plan, then acts.

Control

Locomotion, balance, manipulation, and the math that drives motion.

Compliance

Compliance is the ability of a robot to "give" under external force rather than rigidly resist it. A compliant robot is safer to work alongside, less likely to damage objects on contact, and tolerant of minor positioning errors during manipulation.

Degrees of freedom

Degrees of freedom (DOF) count the number of independent ways a robot can move. Each actuated joint contributes one DOF. A typical humanoid hand has 11–22 DOF; a full humanoid platform commonly has 25–40 DOF excluding the hands.

Impedance control

Impedance control regulates the dynamic relationship between force and motion at a joint or end-effector. Instead of "go to position X" or "apply torque T", the controller is told "behave like a spring with stiffness K and damping D". The robot then responds compliantly to external forces.

Loco-manipulation

Loco-manipulation is the simultaneous coordination of locomotion and manipulation — walking while carrying a load, opening a door while stepping through it, manipulating an object that is itself moving. It is the practical capability that distinguishes a humanoid from a mobile arm or a stationary manipulator.

Torque control

Torque control commands joints by the torque they should apply rather than by the position they should reach. Torque-controlled robots can be back-drivable, comply with external forces, and execute physically reasonable motions without fighting their environment.

Whole-body control

Whole-body control is a control approach that treats every joint of the robot as part of one optimization problem, balancing many objectives at once: track a hand goal, keep the feet planted, stay balanced, respect torque limits. The output is a coordinated set of joint commands across the entire body.

ZMP balance

The Zero Moment Point (ZMP) is the point on the ground where the net horizontal moment from the robot's weight and inertia equals zero. Keeping the ZMP inside the robot's support polygon (its feet) is a classical sufficient condition for not falling over.

Hardware

Actuators, sensors, end-effectors, and the physical platform.

Actuator

An actuator converts energy (electrical, hydraulic, pneumatic) into mechanical motion. In a humanoid, every joint has at least one actuator. Modern bipedal humanoids overwhelmingly use electric actuators — typically a brushless motor plus some form of gearing.

Battery hot-swap

Battery hot-swap is the ability to replace a depleted battery with a charged one without powering the robot down. For humanoids running multi-shift operations, hot-swap matters — it determines whether duty cycle is bounded by battery life or by mechanical wear.

Depth camera

A depth camera produces an image where each pixel encodes distance instead of (or in addition to) color. Common technologies: structured light (Intel RealSense), time-of-flight, and stereo. Used for object recognition, manipulation, and close-range obstacle detection.

Dexterous hand

A dexterous hand is an end-effector designed to approximate human hand capability — typically 5 fingers, 11+ degrees of freedom, opposable thumb, and the ability to perform precision grasps as well as power grasps. Dexterous hands are the most expensive single subsystem on most humanoids.

End-effector

An end-effector is whatever sits at the end of the robot's arm and interacts with the world — a parallel gripper, suction cup, dexterous hand, or task-specific tool. End-effector choice is a major capability and cost lever on any humanoid platform.

Harmonic drive

A harmonic drive is a high-ratio, zero-backlash gear reduction commonly used in robot joints. It uses a flexible gear that meshes against a rigid one as it deforms, achieving 50:1 to 200:1 ratios in a compact package. Trade-off: rigidity (which kills back-drivability) and high cost.

IMU

An IMU (Inertial Measurement Unit) measures linear acceleration and angular velocity along three axes, often with a magnetometer for heading. Every humanoid has at least one IMU, usually in the torso, providing the high-rate motion data the balance controller depends on.

LIDAR

LIDAR (Light Detection and Ranging) is a sensor that measures distance by timing reflected laser pulses. On a humanoid, a small spinning or solid-state LIDAR provides a 3D point cloud of the surroundings, used for mapping, obstacle avoidance, and SLAM.

Planetary roller screw

A planetary roller screw converts rotary motion to linear motion using threaded rollers around a central screw. It carries far more load than a ball screw at similar size, runs at high duty cycles, and is the actuator standard inside Tesla Optimus.

Quasi-direct-drive

A quasi-direct-drive (QDD) actuator pairs a high-torque motor with a single-stage low-ratio gearbox (typically 6:1 to 10:1). The result is a joint that's naturally back-drivable, torque-transparent, and compliant — at the cost of more motor mass and higher current draw.

Series elastic actuator

A series elastic actuator places a calibrated spring between the motor output and the joint. The spring deflection is measured to estimate output torque, and the spring decouples motor inertia from output, providing inherent shock absorption and improved force control.

Tendon drive

A tendon drive transmits actuator force through cables (tendons) routed from a remote motor to the joint. This concentrates motor mass at the base while leaving the moving structure light and compact — a common design for dexterous hands.

Platforms

Software / model platforms and frameworks built for humanoids.

Figure Helix

Figure Helix is Figure's in-house vision-language-action model, the AI brain shipped on Figure 03. It takes the robot's camera feeds plus a natural-language instruction and outputs whole-body motor commands, including the S0 perception policy for stair traversal and obstacle handling.

NVIDIA Cosmos

NVIDIA Cosmos is a foundation-model platform for world models — neural networks trained to simulate physical scenes for robotics. Robotics teams use Cosmos to generate synthetic training data, predict environment dynamics, and pretrain perception components.

NVIDIA GR00T

NVIDIA GR00T (Generalist Robot 00 Technology) is NVIDIA's foundation-model platform for humanoid robotics. It provides a pretrained base model plus a synthetic-data + simulation pipeline (Isaac Sim) so manufacturers can fine-tune control policies without training from scratch.

Tesla Optimus AI

Tesla Optimus AI is the in-house neural network stack that runs on Tesla's Optimus humanoid. It shares architecture and training infrastructure with Tesla's Full Self-Driving stack, applied to humanoid embodiment.

Business

Commercial models, deployment readiness, and procurement vocabulary.

Cobot

A cobot is a robot designed to work safely alongside humans without a fence. Most cobots are arm-only (Universal Robots, Franka Emika); humanoid robots are an extension of the cobot paradigm to bipedal full-body platforms.

Deployment readiness level

Deployment readiness level is an analogy to NASA's Technology Readiness Level (TRL), adapted for humanoid robotics. It rates how close a platform is to reliable, unsupervised commercial operation: prototype demos at the low end, mass-deployed unsupervised work at the high end.

Human-robot collaboration

Human-robot collaboration is the deployment pattern where a robot and a human worker share a workspace and a task, rather than the robot operating in a fenced-off cell. For humanoids, HRC is the dominant deployment model — the form factor exists precisely to fit human-shaped workspaces.

Robot-as-a-Service

Robot-as-a-Service (RaaS) is a commercial model where the customer pays a recurring fee — monthly or per-operating-hour — for the robot, software, and service bundle, instead of buying the unit outright. Common humanoid RaaS rates land at $2–$5 per operating hour.

Total cost of ownership

Total cost of ownership for a humanoid robot is the sum of purchase price, financing, integration, operating cost (energy + software + network), maintenance, parts, insurance, and decommissioning, amortized over the unit's working life. A common simplification: TCO/hour ≈ (price ÷ life-hours) + operating cost + maintenance reserve.