Bimanual tasks are coordination problems
Folding fabric, tying, opening packaging, assembling parts, and preparing food require hands to take different roles over time. One hand may stabilize while the other manipulates, then the roles may switch.
Datasets should preserve these transitions instead of reducing the task to one broad activity label.
Capture detail without losing context
A first-person view can reveal finger placement and contact, while a fixed view preserves the complete workspace and object geometry. Multi-view capture is especially useful when one hand frequently occludes the other.
Task scripts should define acceptable variation while allowing natural execution so models see realistic coordination.
Annotate the signals a policy can use
Useful labels may include left and right hand roles, contact start and end, object state, task phase, success status, and quality flags. These fields support both retrieval and targeted evaluation.
Before scaling, teams should inspect whether the labels remain consistent across difficult tasks and different annotators.
