Digital Event Horizon
NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets
Summary:
In this article, we will explore NVIDIA's recent announcement for physical AI development, which includes the release of new open-source models and datasets. The new suite of world foundation models with multicontrols called Cosmos Transfer and the first open model for general humanoid reasoning called NVIDIA Isaac GR00T N1 represent a significant leap forward in physical AI technology.
NVIDIA announces a suite of open-source releases for physical AI development, including Cosmos Transfer, a highly curated Physical AI Dataset, and the first open model for general humanoid reasoning called NVIDIA Isaac GR00T N1. Cosmos Transfer introduces a new level of control and accuracy in generating virtual world scenes with multicontrols. The dataset can be controlled by various input types, including visual or geometric data, allowing developers to guide the output and achieve precise spatial alignment and scene composition. Cosmos Transfer generates photorealistic video sequences with controlled layout, object placement, and motion. The model is adaptable through post-training for specific embodiments, tasks, and environments, offering a powerful tool for advancing autonomous systems.
NVIDIA's annual GTC conference has once again proven to be a pivotal event for the field of artificial intelligence, as the company unveiled a trio of groundbreaking open-source releases aimed at accelerating physical AI development. The new suite of world foundation models (WFMs) with multicontrols called Cosmos Transfer, a highly curated Physical AI Dataset, and the first open model for general humanoid reasoning called NVIDIA Isaac GR00T N1 represent a significant leap forward in physical AI technology.
The release of Cosmos Transfer marks an exciting milestone in the development of physically intelligent systems. This new world foundation model introduces a new level of control and accuracy in generating virtual world scenes, available in 7 billion parameter size. The model utilizes multicontrols to guide the generation of high-fidelity world scenes from structural inputs, ensuring precise spatial alignment and scene composition.
The creation of Cosmos Transfer was made possible by training individual ControlNets separately for each sensor modality used to capture the simulated world. At inference time, developers can use various input types, including structured visual or geometric data such as segmentation maps, depth maps, edge maps, human motion keypoints, LiDAR scans, trajectories, HD maps, and 3D bounding boxes to guide the output. The control signals from each control branch are multiplied by their corresponding adaptive spatiotemporal control maps and then summed before being added to the transformer blocks of the base model.
The generated output is photorealistic video sequences with controlled layout, object placement, and motion. Developers can control the output in multiple ways, such as preserving structure and appearance or allowing appearance variations while maintaining structure. Outputs from Cosmos Transfer vary across different environments and weather conditions.
Cosmos Transfer coupled with the NVIDIA Omniverse platform is driving controllable synthetic data generation for robotics and autonomous vehicle development at scale. Sample datasets are available on GitHub, providing developers with high-quality data to enhance their AI models.
In addition to Cosmos Transfer, NVIDIA has also released an open-source Physical AI Dataset consisting of 15 terabytes of data representing more than 320,000 trajectories for robotics training, plus up to 1,000 Universal Scene Description (OpenUSD) assets. This commercial-grade dataset is designed for post-training foundation models like Cosmos Predict world foundation models.
Another exciting announcement from NVIDIA is the release of NVIDIA Isaac GR00T N1, the world's first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments. The NVIDIA Isaac GR00T-N1-2B model is available on Hugging Face.
Isaac GR00T N1 was trained on an expansive humanoid dataset consisting of real captured data, synthetic data generated using components of the NVIDIA Isaac GR00T Blueprint, and internet-scale video data. It is adaptable through post-training for specific embodiments, tasks, and environments. The model features a dual-system architecture inspired by human cognition, consisting of the Vision-Language Model (System 2) and the Diffusion Transformer (System 1).
The Vision-Language Model interprets the environment through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. Meanwhile, the Diffusion Transformer generates continuous actions to control the robot's movements, translating the action plan made by System 2 into precise, continuous robot movements.
Post-training is the path forward to advancing autonomous systems, creating specialized models for downstream physical AI tasks. The NVIDIA Isaac GR00T-N1-2B model is available on Hugging Face. Sample datasets and PyTorch scripts for post-training using custom user datasets are also available on GitHub.
The release of these open-source resources marks an exciting milestone in the development of physically intelligent systems, offering developers powerful tools and resources to advance robotics systems, and enhance autonomous vehicle technology.
Related Information:
https://www.digitaleventhorizon.com/articles/NVIDIA-Unveils-Groundbreaking-Open-Source-Releases-for-Physical-AI-Development-deh.shtml
https://huggingface.co/blog/nvidia-physical-ai
Published: Tue Mar 18 17:33:11 2025 by llama3.2 3B Q4_K_M