字幕表 動画を再生する 英語字幕をプリント MALE SPEAKER: Welcome, everybody, to one more Authors at Google Talk. Today, our guest speaker is Pedro Domingos, whose new book is called "The Master Algorithm." We have it here and you can buy copies outside. So one definition of machine learning is "the automation of discovery." Our guest, Pedro Domingos, is at the very forefront of the search for the master algorithm, a universal learner capable of deriving all knowledge, past, present and future, from data. Pedro Domingos is a professor of Computer Science and Engineering at the University of Washington. He's the co-founder of the International Machine Learning Society. Pedro received his MS in Electrical Engineering and Computer Science from IST in Lisbon, his Master's of Science and PhD in Information and Computer Science from the University of California at Irvine. He spent two years as an assistant professor at IST before joining the faculty of the University of Washington in 1999. Pedro is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is the winner of the SIGKDD Innovation Award, the highest honor in data science. He's an AAAI Fellow and has received the Sloan Fellowship and NSF Career Award, a Fulbright scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. He's a member of the editorial board of "The Machine Learning Journal." Please join me in welcoming Pedro, today, to Google. [APPLAUSE] PEDRO DOMINGOS: Thank you. Let me start with a very simple question-- where does knowledge come from? Until very recently, it came from just three sources, number one, evolution-- that's the knowledge that's encoded in your DNA-- number two, experience-- that's the knowledge that's encoded in your neurons-- and number three, culture, which is the knowledge you acquire by talking with other people, reading books, and so on. And everything that we do, right, everything that we are basically comes from these three sources of knowledge. Now what's quite extraordinary is just, only recently, there's a fourth source of knowledge on the planet. And that's computers. There's more and more knowledge now that comes from computers, is discovered by computers. And this is as big of a change as the emergence of each of these four was. Like evolution, right, well, that's life on earth. It's the product of evolution. Experience is what distinguishes us mammals from insects. And culture is what makes humans what we are and as successful as we are. Notice, also, that each of these forms of knowledge discovery is orders of magnitude faster than the previous one and discovers orders of magnitude more knowledge. And indeed, the same thing is true of computers. Computers can discover knowledge orders of magnitude faster than any of these things that went before and that co-exist with them and orders of magnitude more knowledge in the same amount of time. In fact, Yann LeCun says that "most of the knowledge in the world in the future is going to be extracted by machines and will reside in machines." So this is a major change that, I think, is not just for us computer scientists to know about and deal with, it's actually something that everybody needs to understand. So how do computers discover new knowledge? This is, of course, the province of machine learning. And in a way, what I'm going to try to do in this talk is try to give you a sense of what machine learning is and what it does. If you're already familiar with machine learning, this will hopefully give you a different perspective on it. If you're not familiar with machine learning already, this should be quite fascinating and interesting. So there are five main paradigms in machine learning. And I will talk about each one of them in turn and then try to step back and see, what is the big picture and what is this idea of the master algorithm. The first way computers discover knowledge is by filling gaps in existing knowledge. Pretty much the same way that scientists work, right? You make observations, you hypothesize theories to explain them, and then you see where they fall short. And then you adapt them, or throw them away and try new ones, and so on. So this is one. Another one is to emulate the brain. Right? The greatest learning machine on earth is the one inside your skull, so let's reverse engineer it. Third one is to simulate evolution. Evolution, by some standards, is actually an even greater learning algorithm than your brain is, because, first of all, it made your brain. It also made your body. And it also made every other life form on Earth. So maybe that's something worth figuring out how it works and doing it with computers. Here's another one. And this is to realize that all the knowledge that you learn is necessarily uncertain. Right? When something is induced from data, you're never quite sure about it. So the way to learn is to quantify that uncertainty using probability. And then as you see more evidence, the probability of different hypotheses evolves. Right? And there's an optimal way to do this using Bayes' theorem. And that's what this approach is. Finally, the last approach, in some ways, is actually the simplest and maybe even the most intuitive. It's actually to just reason by analogy. There's a lot of evidence in psychology that humans do this all the time. You're faced with a new situation, you try to find a matching situation in your experience, and then you transfer the solution from the situation that you already know to the new situation that you're faced with. And connected with each of these approaches to learning, there is a school of thought in machine learning. So the five main ones are the Symbolists, Connectionists, Evolutionaries, Bayesians, and Analogizers. The Symbolists are the people who believe in discovering new knowledge by filling in the gaps in the knowledge that you already have. One of the things that's fascinating about machine learning is that the ideas in the algorithms come from all of these different fields. So for example, the Symbolists, they have their origins in logic, philosophy. And they're, in some sense, the most "computer-sciency" of the five tribes. The Connectionists, their origins are, of course, in neuroscience, because they're trying to take inspiration from how the brain works. The Evolutionaries, well, their origins are, of course, in evolutionary biology, in the algorithm of evolution. The Bayesians come from statistics. The Analogizers actually have influences from a lot of different fields, but probably the single most important one is psychology. So in addition to being very important for our lives, machine learning is also a fascinating thing, I think, to study, because in the process of studying machine learning, you can actually study all of these different things. Now each of these "tribes" of machine learning, if you will, has its own master algorithm, meaning its own general purpose learner that, in principle, can be used to learn anything. In fact, each of these master algorithms has a mathematical proof that says, if you give it enough data, it can learn anything. OK? For the Symbolists, the master algorithm is inverse deduction. And we'll see, in a second, what that is. For the Connectionists, it's backpropagation. For the Evolutionaries, it's genetic programming. For the Bayesians, it's probabilistic inference using Bayes' theorem.