How’s this for a dystopian future: You finally receive your personal robot assistant, delivered to your door by Amazon drone. You unpack the shiny new machine, dust off the Styrofoam peanuts, and charge up the batteries. Then you switch it on and lead it to the kitchen so it can cook you dinner. The robot points its camera at you, waiting. Suddenly you realize in horror that your assistant doesn’t know how to cook, either—you’re supposed to teach it.
To prevent this nightmare dinnertime scenario, computer scientists are working on a robot that can teach itself to cook. It learns by watching YouTube videos.
This is much harder for a robot than it is for you, no matter how inept a cook you are. Imagine a mind that’s stumped by CAPTCHAs (“Letters with a squiggle through them? I’m out!”) trying to follow a video host who’s chatting and chopping at the same time. To tackle the task, University of Maryland graduate student Yezhou Yang and his coauthors broke it down into a few simpler pieces.
First, their robot would look at the person’s hands. For each hand, it would decide what type of grip the person was using. Was it a powerful grasp, as when holding a knife or a jar lid? Or was it a more delicate, precise grasp, maybe to lift a slice of bread from the counter? How wide was the object? The scientists taught the robot to recognize six grasps in all.
Next, the robot would try to identify the objects in the video. The researchers taught it 48 objects, including tools (such as spatula, bowl, and brush) and foods (meat, lettuce, yogurt, and so on).
After the robot had matched what the video host was holding in each hand came the crucial step—actually doing something.
“Due to the huge variation in human actions,” Yang says, it’s not yet possible for the robot to deduce what someone’s doing just by watching. So the researchers taught their robot to guess instead. Given the objects in its hands, the robot picked the most likely verb from 10 options: cut, pour, transfer, spread, grip, stir, sprinkle, chop, peel, or mix?
The authors chose 88 cooking videos from YouTube and used most of them to train their robot. The last dozen video clips—each showing just one cooking action—were the robot’s final exam.
The aspiring robot chef performed pretty well. After watching the test videos, it chose the right kind of grasp about 90% of the time. It correctly identified the objects about 80% of the time, and did equally well at guessing the action. Some of its mistakes happened when the videos included objects it hadn’t been trained on. When it saw a person using a knife to slice tofu, for example, the robot guessed that it was supposed to slice up a bowl.
make sure he stays focused on the right videos. Otherwise you might end up with some funky results.