字幕表 動画を再生する
-
LEE FLEMING: Good evening.
-
I am really pleased to welcome you all to "Leaders in Big
-
Data" hosted by Google and the Fung Institute of Engineering
-
Leadership at UC Berkeley.
-
I'm Lee Fleming.
-
I'm director of the Institute and this is a Ikhlaq Sidhu,
-
chief scientist and co-founder.
-
The first and most important thing is to thank Google for
-
hosting the event.
-
So thank you very, very much.
-
There's a couple people in particular, Irena Coffman and
-
Gail Hernandez--
-
thank you-- and also Arnav Anant, our entrepreneur in
-
residence at the Fung Institute.
-
So here's Arnav.
-
AUDIENCE: A lot of work.
-
LEE FLEMING: Huge amount of work.
-
The Fung Institute-- we were founded about two years ago.
-
And the intent is to do research and pedagogical
-
development in topics of engineering leadership.
-
We have our degree, the Master's of Engineering--
-
professional Master's of Engineering M. Eng. program--
-
mainly around the Institute.
-
We also have ties though across the campus, as you'll
-
see shortly.
-
This is our intent to have a series of talks on topics of
-
interest to engineering leaders.
-
As it turns out, this Wednesday we
-
have our next talk.
-
It's sponsored by [? Thai ?] and the Fung Institute.
-
And the topic is entrepreneurship--
-
being an entrepreneur within your firm.
-
And fittingly, we have representatives from Google,
-
and Cisco, and SAP.
-
That's Wednesday.
-
Consult the Fung website or the [? Thai ?] website for
-
details on that.
-
So besides enjoying a good discussion tonight, we have an
-
ulterior motive, as you can probably tell.
-
We're trying to advertise all of our fantastic programs in
-
big data at Cal.
-
Now, whether you're interested in computation, or inference,
-
or application, or some combination of those things,
-
we've got the right program for you.
-
As I mentioned, the professional Masters of
-
Engineering, or M. Eng., across all the different
-
engineering departments--
-
one year degree.
-
We have another one-year degree in the stats
-
department-- a professional degree.
-
There's a two-year degree in the Information School.
-
And finally, there's the Haas MBA.
-
Tonight we've got people from all these programs.
-
You can find their tables, ask them questions, and hopefully
-
we'll see you see at Cal soon.
-
And we also have an additional executive and other programs
-
associated with each of those departments
-
and schools as well.
-
Ikhlaq will now introduce our speakers.
-
IKHLAQ SIDHU: OK, thanks.
-
So let me see.
-
LEE FLEMING: Just slide this here.
-
IKHLAQ SIDHU: All right.
-
Welcome, I want to also thank a couple of people.
-
One is [? Claus Nickoli ?], who is not here at the moment,
-
but to you in the ether, he's just not at the meeting.
-
But he's our host here, and so thank you.
-
You guys can tell him that I thanked him.
-
And also, many of you I've seen here are basically
-
friends, and so thanks for coming.
-
It's good to see you again.
-
This is an event on big data.
-
And so I'm going to give you a little data on
-
who is speaking today--
-
who is here.
-
And the way I think of this is, what we've got is three
-
perspectives of big data from leading firms--
-
from people who represent leading firms in the area.
-
And so let's start with NetApp.
-
We've got Gustav Horn.
-
He is a senior consulting engineer with 25 years of
-
experience.
-
And he's built some of the largest enterprise-class
-
Hadoop systems in the world-- on the planet.
-
And from Google, Theodore Vassilakis, and he's a
-
principal engineer at Google.
-
He's ahead of the team that works on data analytics.
-
And he's been responsible for numerous contributions to
-
Google in terms [? about ?] search, and the visualization
-
and representation of the results.
-
And from VMware, Charles Fan, who's senior VP of strategic
-
R&D. He co-founded Rainfinity and was CTO of the company
-
prior to its acquisition by EMC in 2005.
-
And our distinguished set of speakers is moderated by our
-
distinguished moderator, Hal Varian.
-
He is chief economist here at Google.
-
He's an emeritus professor at UC Berkeley and the founding
-
dean of the School of Information.
-
So with that, there's hardly anything more I
-
could possibly say.
-
Come on up Hal and take it away.
-
HAL VARIAN: Thank you.
-
I'm very impressed with the turnout tonight, seeing as
-
you're missing both the debate and the baseball game.
-
But at least it eliminates a difficult
-
choice for many people.
-
I will say that I'm going to follow the same rules as the
-
presidential debates.
-
So no kicking, biting, scratching, or bean balls are
-
allowed during this performance.
-
We're going to talk about foreign policy, wasn't that
-
the agreement?
-
No.
-
All right.
-
In any event, what I thought we'd would do is, we'd have
-
each person talk for about five minutes, lay out their
-
theme, where they're coming from, what their perspective
-
is on big data.
-
And I will take some notes, and then ask some questions,
-
get a conversation going.
-
And I think we'll have a little time at the end for
-
some questions from the floor.
-
So, take it away.
-
THEO VASSILAKIS: Sure.
-
So, should I start, Hal?
-
HAL VARIAN: Yes.
-
THEO VASSILAKIS: All right.
-
Well, hey it's a real pleasure to be here.
-
Thank you guys also, and thank you guys for coming.
-
It's a huge, huge audience.
-
Just a couple of words.
-
As you heard, my name is Theo.
-
I lead some of our analytical systems.
-
So I'm responsible--
-
well, actually up until two weeks ago, I was responsible
-
for a stack that had parallel data warehousing components,
-
query engines, pieces like Dremel, and Tenzing systems
-
that let you query this data, and
-
visualization layers on top.
-
And that's one of the many, many systems at Google that I
-
think, outside, one would think of as
-
big-data type of systems.
-
And so I'll try to give you my perspective at least on the
-
Google view of big data.
-
And hopefully someone will cut me off when it's time.
-
I think I'll probably go for five minutes.
-
This could take a while.
-
AUDIENCE: [INAUDIBLE]
-
THEO VASSILAKIS: All right, sounds good.
-
Thank you.
-
I think, as you guys know, Google's business is primarily
-
about taking data and organizing the world's
-
information, and making it universally
-
accessible and useful.
-
So a lot of what the company does is really about sucking
-
in data-- whether it be the web, whether it be the imagery
-
from Street View, or satellite imagery, or maps information,
-
or Android pings, or you name it.
-
And then transforming it into usable forms.
-
So really, Google is kind of a big data
-
machine in some sense.
-
And I think the term big data came into
-
currency relatively recently.
-
And we all said, yeah, OK, that speaks to what we do.
-
Because we don't really have a word for it.
-
We just kind of knew that the data was large.
-
But just to try to put maybe more structure on to that, I
-
think the Google view on a lot of "what is big data
-
processing" kind of splits up into probably what I would
-
call ingestion type of processes--
-
things like the crawlers, things like all those Street
-
View cars running through all the streets of the world.
-
And then goes into transaction processing systems, where
-
perhaps we capture data through interactions on a lot
-
of our web properties, or a lot of the web properties that
-
we partner with.
-
This means people clicking on search, or people interacting
-
with docs, or people interacting with maps.
-
All generate many, many clicks and many, many interactions
-
that then become transactional big data.
-
Of course, that also includes people using let's say Google
-
Analytics on their sites to measure traffic on their
-
properties, which then generates huge volumes of
-
pings into Google--
-
many tens of thousands of QPS of pings.
-
So that's kind of the second big component.
-
And then probably the third component is the processing
-
side of all of that.
-
The process side includes things like map [? reduce, ?]
-
analysis, generating insights from that data--
-
maybe in the form of building machine learning models.
-
Maybe in the form of building, for example, Zeitgeist top
-
queries that can then be served out to the world to
-
say, hey here is what people are searching for.
-
Maybe in the form of engrams of all the books that Google
-
scanned over many, many years of its ingestion processes.
-
But it's really baking all of that information and then
-
presenting it in some usable form, either through a system
-
such as our ad system that takes models and decides what
-
ads to show, or in a more direct
-
form such as the engrams.
-
Just to say, OK, here are those three broad classes--
-
ingestion, transaction processing, and analytical
-
processing.
-
To dig a little bit deeper into each of those areas, I
-
would say the ingestion processes, especially the very
-
large scale ingestion processes, are
-
highly custom systems.
-
If you think about our web crawlers, if you think about
-
the Street View cars, if you think about maps stitching, or
-
satellite imagery stitching--
-
those are very, very custom processes that I think, at
-
least to this date, don't have a clear analog
-
in the general industry.
-
And maybe this is something that you guys might address or
-
might see differently than how I see the version.
-
They're still highly-specialized systems
-
that produce very large images.
-
And they're very high performance, very complex
-
systems that are run by dedicated engineering teams.
-
The transaction processing systems or the storage systems
-
are things like the Google File System.
-
These are things like Big Table.
-
These are things like Megastore.
-
Those are the ones that we've actually published papers
-
about and that are now reasonably well
-
known in the industry--
-
have evolved a little bit past the purely custom stage, where
-
they're fairly general purpose.
-
And there was a time at Google where actually most people did
-
their own storage in some form or another, until these
-
GFS-like systems evolved to the point where they were good
-
enough that more than one team could use them.
-
And actually, that evolution had many steps in which, for
-
example, everybody ran their own GFS.
-
And so maybe the ads team had their own GFS cells, and the
-
search team maybe had their own GFS cells.
-
And in time, the systems matured to the point where
-
actually we could have a centrally-managed file system.
-
And I think recently you may have seen, we've now talked
-
about this global file system called Spanner which takes
-
that to yet another level of transactions and global
-
availability.
-
And then the third step, which is I think still in a
-
relatively immature stage compared to some of the
-
storage systems, is the analysis.
-
And I think a lot of people know about MapReduce and some
-
of the systems that have been built on top of that.