Digital Event Horizon
A revolutionary new technique has been developed to train general-purpose robots using diverse data from varied domains and modalities, promising significant enhancements in robot performance. By harnessing the power of large language models, researchers have created a universal "robot brain" that could be used for various tasks without training.
MIT researchers have developed a groundbreaking training technique for general-purpose robots called Heterogeneous Pre-trained Transformer (HPT). HPT enables robots to learn from diverse data across various domains and modalities, significantly enhancing their performance. The breakthrough involves pooling data from simulation and real-world scenarios, as well as different modalities like vision sensors and robotic arm position encoders. The technique employs a transformer architecture similar to large language models to process pooled data. The development of HPT overcomes the challenge of creating massive datasets for pretraining the transformer. The architecture efficiently converts raw proprioception signals into data that can be processed by the transformer. Testing results show significant improvements in robot performance across simulation and real-world tasks. HPT enables training across diverse datasets, scaling up dataset sizes and allowing for model adaptation to new robot embodiments.
Massachusetts Institute of Technology has recently made a groundbreaking discovery that promises to revolutionize the field of robotics by developing an innovative training technique for general-purpose robots. This new approach, inspired by large language models, enables researchers to train robots using diverse data from varied domains and modalities, thereby significantly enhancing their performance.
According to the latest research published on MIT News, the breakthrough is attributed to the development of a novel training technique called Heterogeneous Pre-trained Transformer (HPT). This technique involves pooling diverse data from various sources, including simulation and real-world scenarios, as well as different modalities such as vision sensors and robotic arm position encoders. The goal is to align these disparate inputs into a shared "language" that can be processed by a generative AI model.
The researchers employed a transformer architecture, similar to those used in large language models, to process the pooled data. Each input was represented with a fixed number of tokens, allowing the transformer to map all inputs into one shared space. The larger the transformer became as it processed and learned from more data, the better its performance would be.
The development of HPT has been a long-standing challenge in robotics research. One of the biggest hurdles was creating a massive dataset for pretraining the transformer. To overcome this, researchers pooled 52 datasets containing over 200,000 robot trajectories across four categories, including human demonstration videos and simulation scenarios.
Another critical aspect of HPT is its ability to efficiently convert raw proprioception signals from various sensors into data that can be processed by the transformer. The researchers emphasized the importance of proprioception in enabling dexterous motions, ensuring that their architecture placed equal emphasis on both vision and proprioception.
The testing results of HPT were nothing short of impressive, with significant improvements in robot performance across both simulation and real-world tasks. Even when the task diverged significantly from the pretraining data, HPT still demonstrated substantial gains in performance.
David Held, associate professor at the Carnegie Mellon University Robotics Institute, noted that this novel approach enables training across diverse datasets, thereby scaling up dataset sizes. This also allows for model adaptation to new robot embodiments, a critical consideration as new designs are continually being produced.
The researchers have already started exploring how data diversity could further enhance HPT performance and plan to develop the capability to process unlabeled data like GPT-4 large language models.
Their ultimate goal is to create a universal "robot brain" that can be downloaded and used for robots without any training, marking a significant breakthrough in robotic policies.
This research was made possible through the support of the Amazon Greater Boston Tech Initiative and the Toyota Research Institute, underscoring the importance of collaborations between industry partners and academia in driving innovation.
Related Information:
https://news.mit.edu/2024/training-general-purpose-robots-faster-better-1028
https://techxplore.com/news/2024-10-faster-general-purpose-robots-technique.html
Published: Mon Oct 28 16:29:02 2024 by llama3.2 3B Q4_K_M