This is an auto-generated transcript, lightly edited for readability. Timestamps reference the audio version. If you spot an error,
let us know.
imagine this you walk into a laundromat
and a robot has been folding laundry non stop for 24 hours
that's what Jason and his team is working towards
fully autonomous single task commercial grade robots
and they just raised $120 million in order to get there
our goal is to power the future of the physical economy
getting these robots to be very robust and uh
can sustain a long duration of like
actually doing a task is a technical barrier
that hasn't been really solved before our work
for what are the right problems that you should have your robots solve
the bottleneck for useful robots is AI and software
these humanoids right now
they're not actually very useful
the cost and the hardware readiness is a big factor
the best way to succeed is to build research and
product at the same time
quick thing before we get started
we have a lofty goal this year of hitting 1,000 subscribers
in order to help more people build really great companies
so if you enjoy the content
learn something new the best way to support us is by subscribing
okay let's get into it
today's guest is Jason Ma
one of the brightest minds in robotics
he's the lead author of multiple award winning papers
recognized internationally for his research
he had offers to return to Google
Deep Mind and Nvidia Meta
but turned them all down to start Dyna Robotics
and now instead of chasing SCI fi human humanoids
robots that look and move like humans
he's building robots that actually work
folding towels stacking
packing in the real world
so Jason thank you for joining me at Founders in Motion today
uh thank you Thea
thanks for inviting
really happy to be here and nice catching up with you after so long
yeah this is an interesting change of scenery for us
Jason in very plain English
what is Dyna Robotics building
at dyno we're building general purpose robots
that power the future of physical economy
so the way I think about it is that we're developing AI
powered robots that can do any task in any business or home scenario
and to start out we have deployed our robots
uh using AI in many different scenarios such as restaurants
gyms fitness centers
etcetera and our mission is to make this robots and our AI
model as general as possible
so you can basically do any task that you wanted to do
so when it comes to robotics
there are kind of typically two approaches that people usually
you usually go for so the general purpose
everyday kind of companion
supportive robot or the very fancy humanoid um
move and feel like humans
so why did you decide to start with everyday tasks like
folding towels
instead of going after kind of the overarching humanoid dream
yeah so our eventual goal is to develop a robot that can do any task
right so to that end
perhaps at some point we'll venture into humanoid
robots but it is our belief from my past experience and also research
that human
robots as hardware is not currently mature and it's way too expensive
but I think at the present moment
the bottleneck for useful robots is AI and software
so we decided as a company to first focus on off the shelf hardware
so basically these are robots that you can buy off the shelf
couple thousand dollars
and then we develop the AI on top of it so it can be really useful
so uh
you might have seen some of her demos
that robots can already fold napkins
fold cloth uh
do packaging at very very high success rate and robustness
and uh
I think that's a stepping stone towards developing like the more human
like you know
form factor if you will'cause
you know if you look at these humanoids right now
they are not actually very useful
and you could only see them behind a screen
instead of actually seeing the robots in action in front of you
and the cost and the hardware readiness is a big factor
yeah yeah
that's super fair
and so folding laundry seems like a very simple task for humans
um why is it actually so hard for a robot to do first day
I think folding laundry is also uh
fairly challenging for humans
for one it's very mundane and tedious right
like I don't you know
I don't like doing my laundry and
I think for cloth garments
they come in all different shapes types
you know shirts
jeans long sleeves
jackets and what's really hard about robots
why is this hard for robotics is for a couple of reasons
so one uh
just taking a step back right
traditionally you have robots as automation tool
you see robots in factory
and what happens is you pre program robots exact sequence of motions
right so for instance
you want the robots to package into like a box of
you know things you want to deliver
it's a pre program sequence of motion
but that's actually
very difficult to do for something like folding laundry because
you know your clothes is like deformable
right the shapes are always different
they're not in like perfect
you know rigid shapes and states
so it's really hard to preprogram a sequence of motion to fold cloth
for you
and what that means is you really need to develop
you know using generative AI tools
but just like language Model
Chat GPT can really do anything based on your language command
you want to develop AI
models that can control robots to do very fluid and dynamic motions
just like how human arms are very fluid and dynamic
and do many different motions to fold clothes
so that is why this task traditionally has been very difficult for the
uh preceding paradigm of robotics where you pre program the motions
and now it's actually more amenable for the new wave of robotics
which is learning robot actions through like data right
you learn from data yeah
yeah and if we take a moment to dive a little bit deeper into
um into the technology behind all of this
I guess at a high level
how do you think about training these generative models to help
robots recognize different
outcomes in
in the real world or a different
yeah in the real world
yeah so the way we train it is very much like how
you know humans interact with the physical world right
so you know
like for us you know
we have our eyes that perceive the world through vision
through perception and then we use our brain to turn these sensory
you know image inputs to our eyes into like
you know actions that our arms and our legs perform right
so very much you know
for training these robots
it's like
we have cameras on the robots that perceive the physical world
and then we train neural network
um to output actions are like actions to command their robots
you know joints and uh
you know different motors on the robots
so the way to do this is you know
you collect input output pairs of images in and uh
robot joints out right
so you can basically uh
control the robot to fold clothes many
many times to collect data
and then then that data gets fed into a model
which then can control the robot autonomously
so you're essentially kind of building your own general purpose data
models and datasets in order to train these um
folding laundry robots yeah
basically yeah
but you see the paradigm here is fairly general right
yeah
it's not really specific to laundry folding
so if you have data that you collected for
let's say packaging
for cooking a meal for cleaning your bedroom
then the robot using the same algorithm
can train models that can do these tasks
and what's really interesting is that you can actually just combine
all the different data sets for different tasks into one model
and that model becomes very powerful
right just much like language models today
it's not just one model that does one particular language task
but one model that can chat with you
write code for you and do many other things
so that's like the vision that we're also trying to do for robotics
but there's of course
a lot of nuance you know
which we can get into later as well
yeah yeah
super cool
I think I read on your research somewhere that there's a high adapt
adaptability from use cases from one to another
so the next use case should be trained at
at a lighter weight and then continuously forward
so you ran this super cool 24 hour napkin test
nearly 800 folded with 99% success rate
for someone who's not technical
why was that such a breakthrough
the reason why that result was very difficult and very impressive
in the uh
research community robotics community
is that you know
we have seen you probably have seen a lot of demos recently
of robots doing cool stuff
but typically what happens is that those demos are very brittle
it took many many shots to get even one video that works very well
so basically robots are now at a place you can shoot fancy videos
but actually getting these robots to be very robust and uh
can sustain a long duration of like
actually doing a task is a technical barrier
that hasn't been really solved
before our work
especially for these highly dextrous
you know complex manipulation tasks
such as folding laundry folding napkins
right
and so that is why this result was very different in that it's OK
it's a result where we across a span of a week
we shot many
24 hour video of the robot just continuously folding napkins
non stop for 24 hours
hundreds of napkins folded without much failure at all
and that was very different than prior works where yes
you can get a robot to do a laundry folding demo
but the success rate is often like about 70% or 80% success rate
so that means if you try to fold 10 t shirts
you might only succeed eight times
so that's good enough for a demo
but it's not good enough for actual real world deployment
right cause
you know if you imagine your robot only succeeds 8 out of 10 times
that would be a very frustratingly bad robot
right so I die
now we are very much focused on not just putting on fancy demos
but actually developing the AI technology to power
like what actually can work in the physical world
yeah yeah
and for anyone that is a little bit confused
it's okay
cause I printed all these terms in case Jason ever mentioned them
but dexterous manipulation is basically
just teaching robots to use their hands
as flexibility as we do
so folding gripping
managing weight twisting without breaking things
mm hmm and okay
while you were training
the robot to get to this very high level of success rate
what's like
one funny or surprising failure that the robot had
before it finally nailed the test
for the napkin example in particular
what was very very difficult is like actually
so what happens at a real restaurant is like
they'll ship you a stack of
they'll give you a stack of napkins
for which you have to fold the napkins one by one
so the robot initially would make the mistake of like pulling out many
many napkins from the stack
right and then now you have a whole big mess on the table
where tons of napkins are at the center of the table
and then where you're only supposed to fold one
so that was a very tricky scenario that the robot got into early on
but uh
you know
we were able to basically train the robot to handle those scenarios
so whenever it grabs multiple napkins
then it would put the actual ones back onto the stack
but then that creates a pretty messy stack on the side
so the robot then had to figure out how to deal with that situation
but eventually our robot was able to
get around all these tricky scenarios and become very
very good but just from this example
you probably realize you know
real world physical AI like embodied AI
you know
basically teaching robots to do things is actually very complex
if you handle one scenario well
there might be other scenarios that you didn't expect
that the robot has to handle
that comes up so it's kind of like
you know walk in mall
you know like
just cause you handle one scenario
doesn't really mean other hand scenarios are handled very well
which is why physical AI is often not very
very challenging and as our model got really
really good another failure case is it pulls napkin too fast
so the napkin just slipped off the table
but besides those scenarios
yeah the robot was pretty much like 100% in folding a napkin
that's present in front of it
there's so many different edge cases in the human world to deal with
and yeah embody AI here is just AI inside a physical body
so like a robot arm that can see
decide and act in the world
I wouldn't say I'm technical
but like I studied a little bit of technical stuff
so like yeah
this is really fun for like the nerd in me
um so
I also wanted to talk about that approach a little bit
so Dyna 1 is both the arm
so the the body
the actual physical arm and the AI
the brain of it
so why is it so important to kind of build them in conjunction almost
rather than just like building
like a general purpose AI brain
that can be applied for any different type of physical embodiment
the way we think about this is that physical AI is extremely hard
especially if you're interested in pushing real world performance
right so if you think about like pushing real world performance
then it's a matter of both the software and hardware
like if your hardware like always breaks
your robots are just not good enough
then you can't actually run a model
an AI model on the robot for 24 hours right
so previously when we when I was doing research in the lab
we often ran into the issue that the robots would break
after a couple hours or like you have to maintain it
or the robot would start overheating
hahaha after like five or six hours
so it's just even physically not possible to run for 24 hours
so from that simple thought exercise
it's probably uh
more clear that in order to get to any real world performance
you have to do a software hardware
uh co design
co iteration but I said though
you know in Dyna 1
the arms themselves were something that we bought off the shelf
but we had to do a lot of things on top of it to make it more durable
or
more uh
you know just has higher endurance
so we could even try running a model for 24 hours
and during the early days of the research
it's certainly the case that the model would do something like
more violent like maybe it would hit the table very hard
so that makes it even more important that the hardware is good enough
but what I find interesting is that traditionally
robotics would have a lot of safety features or safety layer
like you program the robot to be safer
but in the AI age what we found is that the robots are
the models are more intelligent
he actually just does more
you know dexterous
smooth behavior so
it's
much less likely to even do the unsafe behavior in the first place
so uh
yeah that made hardware
reliability also just a much simpler problem than before
one point that you mentioned
I thought was super interesting is that like
even though you bought the hardware
um off the shelf
it still went through a lot of iteration to ensure um
to ensure that it could even operate in 21st for 24 hours
which kind of double which kind of double ends the point of like
maybe hardware is the bottleneck for humanoids
and that kind of more advanced application of robotics
why did the team choose kind of an application like folding laundry
folding napkins as the first test case
use case to play with
one is that like if you want to use a more AI
like data driven approach to train robots
then it's very important that you have a lot of data right yeah
and uh
folding clothes
folding napkin is a scenario where you couldn't really like
break the object that you are like teaching a robot with right
like you know
clothes is like napkins are soft
so like you couldn't really like mess it up
but yeah there were other applications we looked into
like for instance
like loading dishes right
but there
the safety risk is a lot higher if the robot messes up a single time
like it drops a dish then the dish break
to advance model capability
we start out with some task that has the feature where like
you can just have the robots practice try many
many times
and like close folding laundry was like the perfect scenario
because once you fold it very nicely
you can just like disturb it
so the cloth gets like crumpled again
so you can have the robot practice again
if you will right
so that was from a technology perspective
very appealing but from the business side
it's also the case that there's a huge demand actually
like we don't even think about folding napkins in restaurants
but if you go to like I don't know
like Cheesecake Factory Applebee's of the world
there are just so many like
napkins that need it to be folded in the back
office all the time so it's almost like a full time employees job
like their whole job is to do that
so we thought there's a huge need to like
do this kind of tasks to get started
yeah yeah
I mean I think folding napkins is a universal experience
across all restaurants yeah
um and also even folding laundry is such a pain too
I am also not a big fan of it
you've moved from control tests to doing some kind of
pilots in the real world
so when you move from like a lab setting to say a local laundromat
what new challenges pop up
what we didn't realize is that in the office
first of all like we have air conditioning
so the room was like kind of cool
but like in a lot of these real world scenarios
you know you do not have control of the temperature
so like overheating becomes more severe
you also don't have good control of the network
laundromats don't necessarily have the best Wi-Fi right
so if you think about like running models over cloud
then Wi-Fi becomes a bottleneck
where if you're doing some data collection on site
then uploading data also becomes a lot slower
and there is the operation challenge
like how can we trust to put a robot in your real customer site
and not worry about something goes catastrophically wrong
like for instance like the robots folding napkins in the back office
but you don't want it to like catch on fire by accident
that's the hard part of robotics
it's not just like getting AI to work
you also have to like make sure the deployment flywheel goes smoothly
for a business owner like how does the math work
how are you kind of thinking of pricing these robots
and when do they actually
pay for themselves
uh a lot of these businesses are quite price sensitive
or like they're operating
you know it's a spectrum
but let's say like you know
restaurants are perhaps low margin businesses
so we realized that in order for the economics to make sense at all
like your robots have to be somewhat cheap right
so this is also why like when you were mentioning humanoids
we don't think it's ready because any humanoids you can buy
or first of all you can't buy that many
but the ones you could buy are also in the orders of like
tens of thousand 20,000 if not more right
but our robots are like couple grand each
we do like a robotics service
business model so we don't like actually sell the hardware
we just rent the hardware out to different customers
several grand a month to rent a robot
so it's actually like on par
if not cheaper than like typical labor cost in the United States
when you think about the next application for dyno um
where what is the team kind of actively testing with right now
our goal is to power the future of the physical economy
so basically any task that we think is extremely mundane dull
dirty dangerous for humans to do
we're looking into it looking into many different markets
so you know of course hospitality right like hotel
restaurants where discovering use cases
then you know the laundromats of the world
the uh logistic warehouses of the world
the most important thing just like develop a general recipe
so we could actually just
deploy our models to any scenario that we want
moving into a bit more about your founder journey
so you could have gone back to Deep Mind and Nvidia or meta
and if you only knew the eye popping number that they gave Jason
but I don't think I can say um
but instead you chose to risk your founder path
what made you take that leap
the best way to make an impact in robotics is at a startup
robotics is an extremely complex problem
it's not just a software problem
so you wanted to have the freedom
the velocity to do all the things required to get robots to work
and I firmly believe that getting robots to work requires
you to actually deploy the robots in realistic
real scenarios right
and I think that's just not possible to do at a big corporate lab
where you know
robotics is like a research problem to them
but not necessarily something they want to solve right away right
like all the companies you mentioned
their core businesses are not robotics
and their core AI is not in robotics at the moment right
you know they're developing big language model
what not to you know
power their platforms right
competing with you know
let's say open AI anthropics of the world
so robotics is more like a research problem for them
but I felt that
it's possible to actually get robots to work in the real world
with the right team the right mindset
and you know
there's enough funding that's willing to fund this kind of startup now
so I thought it would make a lot of sense to start Diana
so you've always been kind of on the more research side
so moving into start up and more of the business realm of things
what's one lesson that you've Learned that really surprised you
for robotics in particular
it's very important to have a good taste
for what are the right problems
that you should have your robots solve
there are so many robotics companies in the past that perhaps picked
a problem that's too hard or like
too costly so it's actually very
very hard to penetrate the market and to succeed as a business
and that's also and the hardware is also just innately expensive
so there's been a history of
you know like
you know like
venture capitals don't really like funding hardware companies
for that reason
in research
the most important thing in my opinion is also research taste
now how do you pick a good research problem to work on
what happened in the past is
a lot of robotics company went really deep in a vertical
so all their solutions
their software hardware stack becomes very specialized in that domain
so you cannot just go from one domain to the other so that's
the kind of thing we don't want to do at Dina
what is like
your two sentence
pitch for why Dina's different from other robotic companies
at dyno we do both the research and also the product
and I think
the best way to succeed is to build research and
product at the same time
right
so you can think of let's say like
you know chat
BT right
it's a product that you can use right
but there's also a lot of research that goes behind it to make it well
and I think the feedback loop from product to research
and from research to product is what makes uh
you know the existing AI products on the market like so good and uh
so sticky so you looking to robotics
there are companies just purely trying to take some
existing technologies and turning into a product
but I think right now existing technology just as it is
is not ready to get to general purpose like manipulation
right like using arms
hands to do things but at the same time
if you're only focused on research
then I think sometimes you are focusing on problems
or like tasks that
may not really be representative of the real world tasks
right
because there's always a gap between research and actual deployment
so as a researcher you know
certainly in my research papers and all that
all the tasks I were solving are like
toy versions of the real world tasks
but at dyno we go straight to like the real version of the task
so we know that if our research is good enough to solve those tasks
we can immediately
we turn them into product
that feedback loop makes our technology and business much more solid
research is always theoretical
like the name of it is about being very theoretical
so yeah um
iterating it within deployment
and within testing is very important to make it actionable
throughout this journey
what has been the most valuable lesson that you've Learned
something that you really wish someone had told you earlier
building a startup is actually quite hard
yeah yeah
really I think
I think it's been a year
I think it's definitely harder than like when I did my uh
you know research for PhD
cause in PhD I was really just minding my own business
my own research so like
there's not that much else have to worry about
besides the existing research
for a company like dyno we were
really trying to develop like Full Stack Robotics
right going from hardware software
AI everything
then there's always many things moving at the same time
not only just excel at what I already do
which is AI research but also coordinate
collaborate with many different teams and to make sure we always
being able to prioritize the most important things in the company
and allocate resources accordingly
and I think that's definitely like a shifting gear
compared to previously when I was just doing my research
shifting gears from like an independent contributor to
like a leader is always yeah
very hard um
okay so Jason
the way that we like to end these things is
we like to play a quick game
so sure since you're on the frontier of robotics
I want your predictions so in 10 years
who will be doing these jobs
robots or humans
flipping fast food burgers
robots delivering packages to your door
robots or humans
I think you get to choose or you should be able to choose
teaching kids in classroom
uh humans walking your dog
uh humans or robots
I have seen videos of robot dogs walking like real dogs
performing surgery in a hospital
humans surgery is where like safety is so important
and I think the precision required
you know there's a lot of research papers on surgical robots
but I feel that this is one area I would be like very careful
so yeah I would still trust the human surgeon than robots or maybe
you know maybe Dina will solve it one day
we don't know we'll see
yeah we don't know
I mean I'd also think it's like depending on the application right
thank you so much for coming on the show
I really enjoyed the conversation
Learned a lot more about robotic
yeah thank you for hosting and having me
that's a wrap if you like this episode
please hit the subscribe button
it helps us bring on more awesome guest
level up production and drop new series you wanna watch
see you in 2 weeks