Original Article written by Devorah Fischler
Penn Engineering Today spoke with Michael Posa about robotics in the age of artificial intelligence, the ambulatory genius of toddlers, navigating the unfamiliar and the elegance of not learning everything.
Posa is an Assistant Professor in the Department of Mechanical Engineering and Applied Mechanics and the recipient of an April 2023 grant renewal from the Toyota Research Institute (TRI). His work with TRI untangles the complexities of legged locomotion — refining the still-limited ability of robots to walk and run — and streamlines manipulation, producing simulations that simplify the way robots grasp unknown contexts and objects.
Let’s start with the basics. Why is it so difficult to get a robot to walk or hold things?
Right now, robots are burdened by a mismatch between the complex computerized instructions we give them and the level of simplicity required to be effective. Humans have an intuition for touching the world that doesn’t mesh with the type of algorithms designed to get robots to do the same. If you were to look at the physics of a problem — say, the dexterous manipulation of an object — and you were trying to simulate it on your computer, you would have some complicated geometries of the object, some complicated geometries of the hand and the interaction between these two geometries. This is where the bulk of computation would be done, and it would be inexact, energy-intensive and time-consuming. But, if you think about how a human might pick up and manipulate an object, that level of complexity seems unnecessary. If I pick up a mug, I’m using very complicated movements, yes, but I’m not reasoning about every possible spot I can put my fingers.
If humans had to compute every level of complexity available to us, wouldn’t we also be too overloaded to function?
Pretty much! You have around 20 different axes of motion in your hand. But if I ask you to hold something, there’s more like three independent movements that you’ll use. The human hand is complex, but in practice, it doesn’t often use its full complexity. Humans have found a way of simplifying the problem of planning, control and manipulation that we haven’t found the right equivalent for in computation. Same for walking. Toddlers have intuitive understanding of getting around and balancing that outstrips what most robots can achieve.
Why is it important to have robots that can touch the world the way humans do?
Some tasks are naturally suited to robots. It really comes down to work that is unsafe or undesirable for humans to do. In robotics, we talk about the three Ds: dirty, dangerous and dull. These are tasks that humans do with some risk that robots could alleviate, but it’s important to also realize that robots can do more than just take over the dirty work, they can also provide and enhance a social function. For example, we could imagine robots that help people maintain their autonomy at home as they age. Some people might prefer the comfort of a human helper. Others may favor the assistance of a reliable machine so they can keep a sense of independence.
Will robots need anything else besides the ability to handle objects and walk in order to reliably fulfill these roles?
Yes. These robots will also need to navigate the unknown and unexpected in their environments. Right now, there are a lot of reliable robots on the manufacturing floor. They are fast and accurate, but only in their preprogrammed environments. Once these robots leave those environments, they lose speed and precision. With TRI, we are confronting this roadblock by creating algorithms that give robots simplified instructions, reducing the data necessary to learn and act. We need robots that can not only move quickly and deftly, but also negotiate novelty and uncertainty.
Could you give another example of how we might benefit from this future generation of data-efficient robotics?
Disaster recovery is a big one. When the Fukushima disaster occurred in 2011, it became clear to the world, and the robotics community specifically, how unready robots were for emergency response. It inspired, in part, the DARPA Robotics Challenge, which I was a part of during my Ph.D. That year became a stake in the ground for robotics, forcing us to be realistic about how far along we were and how much farther we needed to go. In 2011, robots could spend half an hour opening a door and crossing a handful of steps and that was about it.
How far have we come since then?
Very far. We’ve seen more and more capable hardware platforms. The Agility Robotics Cassie, which we use in our lab, is something that didn’t exist in 2011. It came about a few years after that. We’ve seen the rise of a climate of commercialized robots, which wasn’t a thing at all back then and is now flourishing. With advances in hardware, software, and the rise of machine learning, robots are far more capable than they were in 2011. However, if Fukushima happened again today, there are still no robots that could go in there and make a real difference beyond survey or search. Nothing will be able to clear rubble and turn valves, fix wiring or press buttons that need to be pressed. But we are a lot closer.
All your work with TRI seems driven by an ethos of simplification. Can you tell us what you’ve been able to achieve?
In some ways, we are re-simplifying robotics for the age of machine learning. There is already a simplified model of walking that has been active in robotics for decades: the inverted pendulum. This model boiled down the complexity of walking to a minimum and got robots impressively far. But inevitably, if you take all of the natural complexity of walking and reduce it back down to a pendulum, you’ve given away a bit too much. You’ve restricted your robot to do things that only pendulums can do, which isn’t that many things. My research asks: How do we get the benefits of the simplicity while also bringing back some of the performance we gave away? In the legged locomotion work, we’ve kept the simplicity of the pendulum model, but we’ve expanded the set of tasks — walking and turning faster, getting up steeper slopes, for example — and significantly lowered energy consumption. In the manipulation work, we are doing simulation, creating simplicity from the ground up. We have robotic hands interacting with an object, collecting data and then coming up with a plan that its algorithm forces to be as simple as possible. It interacts, fumbles, learns and corrects itself until it gets it right. It can do this in four to five minutes, which is an achievement.
Four to five minutes as opposed to what?
If you don’t have any structure and you use reinforcement learning, it can take hours or days. Is that a fair comparison? Sort of. We enforce some minimal structure. But people do write papers about robots learning to manipulate and navigate unfamiliar environments where it takes hours or days. It’s all trial and error, but it depends on how much trial and error you are willing to accept. These other papers aren’t interested in the physical system, they treat it as a big black box. But what we’ve shown is that learning everything is very data inefficient.
So, the incredible progress we’re seeing in artificial intelligence doesn’t translate as neatly into robotics as some people seem to think?
Exactly. At this point, people have used ChatGPT and have seen robots learning. And they have become enamored with the idea that machine learning is going to solve all problems. The key in our lab is to contribute our domain expertise — our understanding of physics and dynamics — and mesh that with algorithms because there are overlaps and efficiencies to exploit. I think there’s a lot of value in deep learning and automation. Robots are going to have to learn things from their environment. It’s not all going to be models and physics. But we are also insisting on the value of techniques people have been thinking about for hundreds of years — physics, control, optimization — and showing that they are not going to go the way of the dinosaur with artificial intelligence taking over.