Apple Research Investigates Robot Learning by Observing First-Person Videos of Humans

Apple Research Investigates Robot Learning by Observing First-Person Videos of Humans

Apple Research Investigates Robot Learning by Observing First-Person Videos of Humans


# Training Humanoid Robots: Learnings from Apple’s Pioneering Strategy

In an influential paper titled “*Humanoid Policy ∼ Human Policy*,” Apple researchers, in partnership with esteemed organizations such as MIT, Carnegie Mellon, the University of Washington, and UC San Diego, have introduced a pioneering technique for training humanoid robots. This method capitalizes on the features of the Apple Vision Pro, illustrating a new fusion of augmented reality and robotics.

## Robot See, Robot Do

The essence of this research is centered on employing first-person videos of individuals interacting with objects as a training resource for general-purpose robot models. By collecting substantial data from both human and robot demonstrations, the team established a dataset called **PH2D**, which contains over 25,000 human demonstrations alongside 1,500 robot demonstrations. This diverse dataset was subsequently utilized to formulate a unified AI policy capable of directing a real humanoid robot in actual environments.

As the authors of the paper clarify, training manipulation policies for humanoid robots with varied data sources substantially improves their robustness and ability to generalize across different tasks and platforms. Conventional methods, which heavily depend on robot demonstrations, tend to be time-consuming and expensive, making scalability challenging. The researchers aimed for a more efficient approach by investigating egocentric human demonstrations as a scalable source of training data for robotic learning.

## Cheaper, Faster Training

To aid the gathering of training data, the research team created an application for the Apple Vision Pro that records video from the device’s bottom-left camera while utilizing Apple’s ARKit to monitor 3D head and hand movements. In an effort to develop a more economical alternative, they also engineered a 3D-printed mount allowing a ZED Mini Stereo camera to be attached to other headsets, such as the Meta Quest 3, ensuring similar 3D motion tracking at a lower cost.

This creative arrangement permits the researchers to capture high-quality demonstrations in just seconds, signifying a major advancement over traditional teleoperation techniques that are often slower, more costly, and more challenging to scale. Notably, considering human movements typically exceed robot speeds, the researchers modified the training procedure by reducing the speed of human demonstrations by a factor of four, allowing the robots to synchronize without necessitating further changes.

## The Human Action Transformer (HAT)

Central to this investigation is the **Human Action Transformer (HAT)**, a model crafted to learn from both human and robot demonstrations within a cohesive framework. Instead of dividing the data by source (humans vs. robots), HAT amalgamates the information to create a single policy that applies to both types of entities. This methodology not only boosts the system’s flexibility but also enhances data efficiency.

In various trials, this combined training technique empowered the robot to address more intricate tasks, including challenges it had not faced before, surpassing traditional training approaches. The findings highlight the promise of this innovative strategy in enhancing humanoid robot capabilities.

## Conclusion

The research detailed in “*Humanoid Policy ∼ Human Policy*” paves the way for thrilling opportunities in the field of robotics. By leveraging human demonstrations and employing advanced tools like the Apple Vision Pro, the researchers have established a foundation for more efficient and effective training methods for humanoid robots.

This study offers a fascinating examination of how human actions can inform and elevate robotic capabilities, marking it as a significant contribution to the robotics domain. As we progress towards a future where humanoid robots may become widespread, the ramifications of this research inspire both anticipation and reflection.

What are your perspectives on the evolution of humanoid robots? Do they excite you, raise concerns, or do you view them as superfluous? Share your thoughts in the comments!