New startup Physical Intelligence isn’t interested in building robots. Instead, the team has something better in mind. By equipping hardware with the continuously learning generalist “brains” of AI software, existing machines can autonomously perform an increasing number of tasks that require precision movements and dexterity, including housework. This means that it will be possible to implement it in a practical manner.
Over the past year, we’ve seen dancing robot dogs, even dogs equipped with fire-setting equipment, and increasingly sophisticated humanoids and machines built to fill specialized roles on assembly lines. I did. But we’re still waiting for Rosie the robot from The Jetsons.
But we might get there soon. San Francisco-based Physical Intelligence (Pi) has unveiled a generalist AI model for robotics. This allows existing machines to perform a variety of tasks. This allows you to take laundry out of the dryer, fold clothes, carefully pack eggs into containers, and grind coffee. Beans and “busing” tables. The system lets a metal mobile helper move around the house, vacuuming, packing and unpacking the dishwasher, making beds, checking the refrigerator and pantry and cataloging their contents, and planning dinner. It’s not hard to imagine being seen. – Also, why don’t you prepare dinner?
With this vision, Pi has released a “general-purpose robot basic model” known as π0 (Pi Zero).
Physical Intelligence (π)’s mission is to bring general-purpose AI to the physical world.
We are excited to take the first step towards this mission and present our first generalist model π₀ 🧠 🤖.
Papers, blogs, uncut videos: https://t.co/XZ4Luk8Dci pic.twitter.com/XHCu1xZJdq
— Physical Intelligence (@physical_int) October 31, 2024
“We believe this is a first step towards our long-term goal of developing physical artificial intelligence, which will allow users to make requests to large-scale language models (LLMs) and chatbot assistants. “In the same way, you can easily ask robots to perform the tasks you need,” the company explains. “Like LLM, our model is trained on extensive and diverse data and can follow a variety of textual instructions. Unlike LLM, our model spans images, text, and actions, and can be Gain physical intelligence by training based on experience and directly learn low-level outputs Through a new architecture, you can control a variety of robots, directing them to perform desired tasks, or and can be fine-tuned to specific application scenarios.
In its research, pi-zero has demonstrated how AI-trained hardware can perform a variety of tasks that require varying levels of dexterity and movement. The basic model performed a total of 20 tasks, all requiring different skills and operations.
“Our goal in selecting these tasks was not to solve specific applications, but to begin to provide models with a general understanding of physical interactions, the first foundations of physical intelligence. ,” the team said.
π₀ is a VLA generalist.
– Perform dexterous tasks (folding laundry, carrying tables, etc.).
– Trans + Flow Matching combines the benefits of VLM pre-training and 50 Hz continuous action chunks.
– Pre-trained on large π datasets across many form factors pic.twitter.com/zX9hvVdQuH— Physical Intelligence (@physical_int) October 31, 2024
Well, I’m the last person to get excited about robotics in New Atlas. The main reason is that most of what we’ve seen so far have been specialized machines. And to be honest, I’ve seen plenty of humanoids moving boxes from point A. Biology experts are very good at exploiting one niche, be it bees, butterflies or koalas, and they do it very well. That is, until external forces such as habitat loss or disease reveal their limits.
However, generalists like raccoons and grizzly bears may not be as good at occupying specific niches as other animals, but they are much more adaptable to a wider range of habitats and food sources. This ultimately makes it better suited to dynamic changes in the environment.
Similarly, general-purpose robots will be able to do more than expertly build brick walls. And the ability to learn allows us to develop an ever-evolving set of skills to adapt to the challenges of the physical world.
Pi-zero uses internet-scale Vision Language Model (VLM) pre-training and flow matching to synchronize movement with AI learning. Its pre-training included 10,000 hours of “dexterous manipulation data” and 68 tasks from seven different robot configurations. This is in addition to existing robot manipulation datasets for OXE, DROID, and Bridge.
We compare π₀ and π₀-small (non-VLM versions) with some previous models.
– Octo and OpenVLA for 0-shot VLA
– Single-task ACT and diffusion policyPerforms better than known tasks, tweaks to new tasks, and zero shots in the following languages pic.twitter.com/TUDsFjitDr
— Physical Intelligence (@physical_int) October 31, 2024
“In order to operate the robot dexterously, pi-zero needs to output motor commands at a high frequency of up to 50 times per second,” the research team points out. “To provide this level of dexterity, we developed a new method to augment pre-trained VLMs with sequential action outputs via flow matching, a type of diffusion model. Starting with a VLM pre-trained on data, we train a visual, linguistic, and behavioral flow matching model. This model can then be post-trained on high-quality robot data to solve a variety of downstream tasks.
“To our knowledge, this represents the largest pre-training mixture ever used for a robot manipulation model,” the researchers said in their study.
Although the company is still in the early stages of research and development, Pi co-founder and CEO Karol Hausmann, a scientist who previously worked on robotics at Google, said the basic model We believe that this will overcome existing hurdles in the field of generalization. The time and cost it takes to train hardware on data from the physical world to learn new tasks. The Pi team also includes co-founder Sergei Levin, a pioneer in robotics development at Stanford University, and former Google researcher Brian Ictor.
In 2023, satirist and architect Karl Charo said, “Humans doing hard labor for minimum wage while robots write poems and paint pictures is not the future I wanted.” The tweet quickly spread. That same year, members of the Writers Guild of America went on strike and Hollywood shut down, seeing a bleak future for creators in the face of a new era of technology.
And while AI may or may already be present in many of our jobs (no need to remind journalists of that), Pai’s vision is a mid-20th century future. It feels more in line with the scholar’s vision. A world where machines make our lives easier. People may think I’m naive, but if a robot came to do my housework, I’d take it.
You can see more videos of the training the team conducted on the pi-zero robot in the Pi blog post, but here’s a video showing their impressive and delicate work.
Sorting processed eggs
A research paper on pi-zero development and training can be found here.
Source: Physical Intelligence