Back to blog

Metadata schemas for robot learning datasets

A practical schema turns raw video into inspectable training data. These are the fields, conventions, and versioning rules robotics teams should define first.

Embodied AI Data Labs 9 min read
Metadata schemas for robot learning datasets

Start with stable clip and capture fields

Every clip needs a stable identifier plus environment, task, participant reference, camera view, frame rate, duration, and capture timestamp. These fields make datasets searchable and auditable before annotation begins.

Consent identifiers, anonymization status, and license scope should be connected at the same level so compliance can be checked without opening separate systems.

Model actions and objects with controlled vocabularies

Labels such as pick up, pour, wipe, fold, and place should have consistent definitions. Object names, states, hand usage, contact events, and task phases should follow the same convention.

Controlled terms reduce ambiguity across collectors and annotators while making downstream filtering and benchmark construction easier.

Design for versioning and buyer-defined extensions

Annotation requirements change as a robotics program matures. A schema should allow new fields without breaking older deliveries, and every export should identify the schema version used.

MP4 plus JSON or CSV works well for many evaluation programs. HDF5 or internal formats can be generated when teams need tighter integration with an existing training stack.

Need human task data your robots can learn from?

Share the task, environment, capture setup, and target volume. We will map the fastest sample or pilot path.

Request Sample