Google DeepMind has released Genie 3, its latest general-purpose world model that can create interactive 3D environments from just a text prompt. “Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p,” reads the DeepMind blog.
A Step Closer to AGI
Powered by decade-old proprietary research in simulated environments, DeepMind has built AI world models that can simulate different real-world scenarios. These models can be used as stepping stones to AGI (Artificial General Intelligence). Unlike Genie 1 and Genie 2, the latest version allows users to interact in real-time.
Applications of Genie 3: History, Animation & Natural World
- Physical World Phenomena: Users can experience real-world properties like water, lighting and other environmental scenarios.
- Natural World Simulation: From rich ecosystems to animal behviors and nuanced plant lives, the model can generate it all.
- Animation & Fiction: Lets users create highly imaginative environments with animated characters and expressive visuals.
- Transcending History & Geography: Users can explore locations and historical places.
Genie 3 Limitations
- Limited Direct Actions: Although many environmental changes can be prompted, the agent itself can only perform a narrow set of direct actions.
- Geographic Realism: Lacks the ability to recreate real-world locations with precise geographic fidelity.
- Text Rendering: Legible text is typically generated only when it’s explicitly included in the input world description.
- Multi-agent Interaction & Simulation: Accurately simulating interactions between multiple independent agents in shared spaces continues to be a challenging area of research.
- Short Interaction Span: The model currently supports only brief interactive sessions lasting a few minutes, rather than prolonged, hour-long engagements.
What’s Next
DeepMind considers Genie 3 “a significant moment for world models”, that will have a ripple effect on generative AI and AI research. The firm is currently exploring implications of the model while continuing to advance it for further applications in education, gaming, and training.