字幕表 動画を再生する
MALE SPEAKER: Welcome, everybody,
to one more Authors at Google Talk.
Today, our guest speaker is Pedro Domingos,
whose new book is called "The Master Algorithm."
We have it here and you can buy copies outside.
So one definition of machine learning
is "the automation of discovery."
Our guest, Pedro Domingos, is at the very forefront
of the search for the master algorithm, a universal learner
capable of deriving all knowledge, past, present
and future, from data.
Pedro Domingos is a professor of Computer Science
and Engineering at the University of Washington.
He's the co-founder of the International Machine Learning
Society.
Pedro received his MS in Electrical Engineering
and Computer Science from IST in Lisbon,
his Master's of Science and PhD in Information
and Computer Science from the University of California
at Irvine.
He spent two years as an assistant professor at IST
before joining the faculty of the University of Washington
in 1999.
Pedro is the author or co-author of over 200
technical publications in machine learning, data mining,
and other areas.
He is the winner of the SIGKDD Innovation Award, the highest
honor in data science.
He's an AAAI Fellow and has received the Sloan Fellowship
and NSF Career Award, a Fulbright scholarship, an IBM
Faculty Award, several best paper awards,
and other distinctions.
He's a member of the editorial board of "The Machine Learning
Journal."
Please join me in welcoming Pedro, today, to Google.
[APPLAUSE]
PEDRO DOMINGOS: Thank you.
Let me start with a very simple question--
where does knowledge come from?
Until very recently, it came from just three sources, number
one, evolution-- that's the knowledge that's
encoded in your DNA-- number two,
experience-- that's the knowledge that's
encoded in your neurons-- and number three, culture,
which is the knowledge you acquire
by talking with other people, reading books, and so on.
And everything that we do, right,
everything that we are basically comes from these three sources
of knowledge.
Now what's quite extraordinary is just, only recently,
there's a fourth source of knowledge on the planet.
And that's computers.
There's more and more knowledge now that comes from computers,
is discovered by computers.
And this is as big of a change as the emergence
of each of these four was.
Like evolution, right, well, that's life on earth.
It's the product of evolution.
Experience is what distinguishes us mammals from insects.
And culture is what makes humans what we are
and as successful as we are.
Notice, also, that each of these forms of knowledge discovery
is orders of magnitude faster than the previous one
and discovers orders of magnitude more knowledge.
And indeed, the same thing is true of computers.
Computers can discover knowledge orders of magnitude
faster than any of these things that went before
and that co-exist with them and orders of magnitude more
knowledge in the same amount of time.
In fact, Yann LeCun says that "most
of the knowledge in the world in the future
is going to be extracted by machines
and will reside in machines."
So this is a major change that, I think, is not just for us
computer scientists to know about and deal
with, it's actually something that everybody
needs to understand.
So how do computers discover new knowledge?
This is, of course, the province of machine learning.
And in a way, what I'm going to try to do in this talk
is try to give you a sense of what machine learning is
and what it does.
If you're already familiar with machine learning,
this will hopefully give you a different perspective on it.
If you're not familiar with machine learning already,
this should be quite fascinating and interesting.
So there are five main paradigms in machine learning.
And I will talk about each one of them in turn
and then try to step back and see, what is the big picture
and what is this idea of the master algorithm.
The first way computers discover knowledge
is by filling gaps in existing knowledge.
Pretty much the same way that scientists work, right?
You make observations, you hypothesize
theories to explain them, and then
you see where they fall short.
And then you adapt them, or throw them away
and try new ones, and so on.
So this is one.
Another one is to emulate the brain.
Right?
The greatest learning machine on earth
is the one inside your skull, so let's reverse engineer it.
Third one is to simulate evolution.
Evolution, by some standards, is actually an even greater
learning algorithm than your brain
is, because, first of all, it made your brain.
It also made your body.
And it also made every other life form on Earth.
So maybe that's something worth figuring out how it works
and doing it with computers.
Here's another one.
And this is to realize that all the knowledge that you learn
is necessarily uncertain.
Right?
When something is induced from data,
you're never quite sure about it.
So the way to learn is to quantify that uncertainty using
probability.
And then as you see more evidence,
the probability of different hypotheses evolves.
Right?
And there's an optimal way to do this using Bayes' theorem.
And that's what this approach is.
Finally, the last approach, in some ways,
is actually the simplest and maybe even the most intuitive.
It's actually to just reason by analogy.
There's a lot of evidence in psychology
that humans do this all the time.
You're faced with a new situation,
you try to find a matching situation in your experience,
and then you transfer the solution
from the situation that you already
know to the new situation that you're faced with.
And connected with each of these approaches to learning,
there is a school of thought in machine learning.
So the five main ones are the Symbolists, Connectionists,
Evolutionaries, Bayesians, and Analogizers.
The Symbolists are the people who
believe in discovering new knowledge
by filling in the gaps in the knowledge
that you already have.
One of the things that's fascinating about machine
learning is that the ideas in the algorithms
come from all of these different fields.
So for example, the Symbolists, they have their origins
in logic, philosophy.
And they're, in some sense, the most "computer-sciency"
of the five tribes.
The Connectionists, their origins
are, of course, in neuroscience, because they're
trying to take inspiration from how the brain works.
The Evolutionaries, well, their origins
are, of course, in evolutionary biology,
in the algorithm of evolution.
The Bayesians come from statistics.
The Analogizers actually have influences
from a lot of different fields, but probably the single most
important one is psychology.
So in addition to being very important for our lives,
machine learning is also a fascinating thing,
I think, to study, because in the process of studying machine
learning, you can actually study all of these different things.
Now each of these "tribes" of machine learning, if you will,
has its own master algorithm, meaning its own general purpose
learner that, in principle, can be used to learn anything.
In fact, each of these master algorithms
has a mathematical proof that says,
if you give it enough data, it can learn anything.
OK?
For the Symbolists, the master algorithm is inverse deduction.
And we'll see, in a second, what that is.
For the Connectionists, it's backpropagation.
For the Evolutionaries, it's genetic programming.
For the Bayesians, it's probabilistic inference
using Bayes' theorem.