Back to blog

A practical metadata schema for robot learning datasets

The right metadata makes human demonstration data filterable, auditable, and easier to connect to training and evaluation workflows.

Embodied AI Data Labs 9 min read
A practical metadata schema for robot learning datasets

Define a stable clip identity

Every asset needs a stable clip_id linked to source capture, task, environment, and timestamps.

Stable identities make deduplication, issue tracking, and model lineage easier.

Describe the task at useful resolution

A broad task label is a starting point. Action, object, hand used, phase, and success status make the dataset more useful for filtering.

Teams should only request fields that connect to real training or evaluation decisions.

Expose capture and quality fields

Camera view, frame rate, resolution, occlusion, lighting, and quality score help buyers understand usable coverage.

Quality flags should be defined with examples so reviewers apply them consistently.

Track privacy and delivery status

Consent reference, anonymization status, annotation status, and delivery version belong alongside technical metadata.

This makes the package easier to inspect across engineering, legal, and procurement workflows.

Need human task data your robots can learn from?

Share the task, environment, capture setup, and target volume. We will map the fastest sample or pilot path.

Request Sample