Use simulation for breadth and real data for calibration
Synthetic environments can generate large numbers of trajectories, object layouts, and camera conditions quickly. That breadth is useful, but it does not automatically reproduce real clutter, wear, lighting, human variation, or sensor artifacts.
A focused real-world dataset helps teams measure which simulated assumptions transfer and where performance drops.
Capture the edge cases that matter in deployment
Liquids, deformable materials, reflective surfaces, partial occlusion, and inconsistent human motion are difficult to model completely. These cases often determine whether a system works outside a controlled demo.
Real capture should prioritize deployment-critical tasks and known model failures rather than simply maximizing hours.
Keep evaluation grounded in reality
Teams can co-train on simulated and real data, but validation should include a held-out set from the intended operating environment. The evaluation schema should preserve task difficulty, environment conditions, and failure labels.
That approach turns sim-to-real from a general aspiration into a measurable data program.
