DreamVu launches PRISM, a 270,000-sample multi-view retail video dataset for embodied AI. Fine-tuning reduces error rates by 66.6% on spatial, physical, and action reasoning tasks across real supermarket environments.
DreamVu has released PRISM, a comprehensive 270,000-sample multi-view video dataset designed specifically for training and evaluating vision-language models on embodied AI tasks in retail environments. Captured across five real supermarkets using both worker-worn egocentric cameras and wide-angle 360° overhead cameras, PRISM addresses critical gaps in existing datasets by integrating spatial, physical, and action reasoning within a single deployment domain.
Fine-tuning on PRISM significantly outperforms general-purpose baselines. The dataset delivers a 66.6% reduction in average error rates across 20 capability probes and reduces embodied reasoning errors by a factor of five. Research shows that combining spatial, physical, and action reasoning in one domain-specific corpus produces gains that broad general-corpus scaling cannot achieve.
PRISM stands out by capturing complementary perspectives from egocentric and exocentric cameras. The dataset uses LLM-generated chain-of-thought reasoning for annotations, which proves more effective than template-based labeling, especially for spatial and causal tasks. Notably, 14 of the 20 capability probes are entirely new and not available in any prior public AI training corpus.
A data-scaling analysis reveals strong results can be achieved efficiently: 60% of the corpus (162,000 samples) reaches 87.7% average accuracy, only 1.2 percentage points below the full-dataset performance. Mixing egocentric and exocentric data enhances cross-view performance without degrading accuracy on egocentric tasks, demonstrating that the two perspectives are complementary.
“The core finding is that domain-specific fine-tuning on data covering spatial, physical, and action reasoning together produces gains that general-corpus scaling does not. We’re releasing the dataset and model weights so the research community can build on it.” — Rajat Aggarwal, Co-Founder and CEO, DreamVu
The 100,000-sample open subset and fine-tuned model weights (Cosmos-Reason2-2B-Retail-Grocery-EgoExo) are freely available on Hugging Face. The full 270,000-sample corpus is available under a commercial license.
DreamVu’s release of PRISM provides the research and developer community with a valuable new resource to advance embodied AI capabilities in real-world retail settings, where integrated spatial, physical, and action reasoning is essential for practical applications.
About DreamVu
DreamVu is a physical AI data infrastructure company. Its proprietary ALIA 360° omnidirectional camera system and multi-view capture infrastructure are used to build training datasets for embodied AI systems in retail, logistics, healthcare, and industrial environments. DreamVu is headquartered in Philadelphia, PA, with R&D in Hyderabad, India, and is a member of the NVIDIA Inception program.