Foundation model for robotics

In brief

A foundation model for robotics is a large, pre-trained neural network designed to be the base layer for many downstream robot tasks. Like a foundation language model, it is trained at scale and then fine-tuned per task or per platform.

The hope is that a single pre-trained model can absorb general visual, physical, and procedural priors — what objects look like, how they typically move, common task structures — so that downstream task fine-tuning needs orders of magnitude less data than training from scratch. NVIDIA's GR00T project is the most public foundation-model effort aimed at humanoids.

Foundation models for robotics are upstream of VLA models: once you have the base, you fine-tune it on instruction-following or specific manipulation tasks to get a deployable policy.

See it in the wild

Browse robots and brands using these techniques

Glossary entries are upstream. The catalog is where the implementations live.

Open catalog All terms