At a real restaurant you get a stack of napkins to fold one by one. Early on the robot would pull out many napkins from the stack at once, making a big mess on the table when it was only supposed to fold one. Later, once the model got really good, another failure was pulling the napkin too fast so it slipped off the table.
More from this episode
Jason recounts that at a real restaurant they'll give you a stack of napkins which you have to fold one by one. The robot initially would make the mistake of pulling out many napkins from the stack, "and then now you have a whole big mess on the table where tons of napkins are at the center of the table" when you're only supposed to fold one. They trained the robot to handle those scenarios — whenever it grabbed multiple napkins it would put the extra ones back onto the stack, though that created a messy stack on the side it then had to deal with. His broader point: real-world physical AI, or embodied AI, is actually very complex — if you handle one scenario well, there might be other scenarios you didn't expect. As the model got really good, another failure case was that it pulled the napkin too fast so the napkin just slipped off the table. Besides those scenarios, the robot was "pretty much like 100% in folding a napkin that's present in front of it."