Start with clip-level traceability
Every video clip should have a stable clip_id, environment, task label, timestamps, camera view, frame rate, and quality score. That makes it possible to filter data before training and audit where model behavior came from later.
For manipulation tasks, fields such as action, object, hands, occlusion, and tool use are often more useful than broad activity labels. They describe what the robot actually needs to learn.
Separate capture specs from annotations
Capture specs describe how the data was filmed: camera angle, lighting, fps, environment, and device. Annotations describe what happened in the video. Keeping those layers separate makes schema changes cleaner.
This separation also helps teams compare egocentric and exocentric views without mixing collection quality with task semantics.
Make compliance visible
Consent status, anonymization notes, and license terms should be attached to deliveries in a way engineering and legal teams can both review.
A rights-cleared dataset is not just a checkbox. It is a repeatable process that connects participant consent, collection scope, delivery use, and retention policy.
