字幕表 動画を再生する
FUMI YAMAZAKI: OK.
Hello everyone.
Thank you for coming.
I'm super excited that we're having Joel
Gurin the author of this book "Open Data Now" to Google.
JOEL GURIN: OK.
Thank you all so much for coming.
I want to say a couple of quick things before we get started.
You can see on this slide I have a website as well as a book.
The website is also open data now,
just for the sake of simplicity.
I use @joelgurin for Twitter I also
use the hash tag #opendatanow.
There is a pattern there.
I'm very happy to be speaking to you today.
Also, if you didn't see it on the way in, on the way out,
there is a sign up sheet if you're
interested in getting free email updates from my website
or from the GovLab.
Please sign up and we'll keep in touch,
because there is a lot to talk about.
So why open data and how did I get into this particular area?
I have to start by saying I am probably by a couple of orders
of magnitude the least technical person in this room right now.
So what you're going to hear from me--
and it is a little humbling, to say the least, to come
to talk about data to Google-- but what
I hope I can bring to this is a sort
of sense of overall perspective and context
from the work that I've done in government and non-profits,
as a journalist and now in academia.
I really have tried to get us a sense and sort of paint
a picture of how open data is being seen and used
in society today that I hope will be helpful to all of you.
And I certainly hope we have a little time for questions.
So my background very briefly-- as Fumi told you,
I began as a science journalist.
I was editorial director, and then
executive vice president of "Consumer Reports" when
we launched consumerreports.org, which
is the largest paid information subscription site on the web
now with about 3 million active paid subscribers.
Shortly after that, I went to the Federal Communications
Commission, began as head of the Consumer Bureau.
And at that point our chairman, Julius Genachowski,
was very interested in figuring out
how we can give consumers help in a simple decision
like choosing a cellphone plan.
Well choosing a cellphone plan ends up
being kind of like solving some difficult problem in topology
or some such thing or at least in
statistics, because there are about 1,000 different cellphone
plans offered by a company like Verizon.
You multiply that by the number of companies,
you factor in the fact that every consumer has
different needs, so it became pretty clear as I was looking
at this and this is a problem is more complicated than it
looks at first.
It also turns out to be very similar to problems
that other government agencies face
in trying to advise consumers on things
like financial services, housing, mortgages, education,
and so on.
So I began talking to people in other agencies
about consumer information, generally.
Out of that I was invited to chair the White House Task
Force on Smart Disclosure.
Smart disclosure being the term that we developed
to describe giving data to consumers
that they can use to make complex decisions.
That report came out last May.
And from that work I became more involved
in open data and open government more generally.
I met Beth Noveck, who some of you
may know as the head of the Open Government Initiative
during President Obama's first term and a real pioneer
and open government and open data.
She has now invited me to come work at the GovLab
that she founded at NYU.
And I'll tell you a few things about that.
And I also have this website and this book on open data
now, so I am sort of running the open data
practice for the GovLab and looking
at the implications of open data in many ways.
Just a couple words about the GovLab.
I won't read what you can see on the screen,
but our basic hypothesis and our mission
is to figure out how to use technology and collaborative
platforms and basically 21st century
approaches to help improve governance and government,
and the way that citizens and government interact.
We think that people should interact with government more
than when they vote once a year or when
they happen to make a comment on We the People petition website
or something like that.
We're looking at ways to really develop
a different level of engagement that
is good both for citizens and for government as well.
And this model of collaborative democracy we feel
has three major modes of operation-
the first one is sharing responsibility,
where a government can take a piece of what
has been a government responsibility
and delegate that to citizens.
And here the paradigm is participatory budgeting,
where in 1,500 cities around the world now
the city government is saying, you take a chunk of the budget
and spend it as you wish.
We think that can be done in many other kinds of governance
situations and that would be very productive.
The second modality is getting knowledge and expertise in.
Figuring out ways that not just the traditional government
advisers, but people with technical abilities,
technical skills, insight into community issues, and so on,
can advise government at the federal, state, and city level
and we're seeing a lot of models for that.
And then the third modality is getting open data out,
which is what I work on and what I'm going to talk about today.
So what is open data?
There are a number of good definitions
that have been done by different groups
like the Open Knowledge Foundation and the Sunlight
Foundation.
What I did in writing this book was
to choose a fairly general definition-
that open data is accessible public data that people,
companies, and organizations can use to launch new ventures,
analyze patterns and trends, make data-driven decisions,
and solve complex problems.
This definition incorporates not only
open data from government, which is where a lot of the focus
has been, but also open data from sources like social media.
For many sources that are accessible to you
at Google, and from other kinds of data that companies
themselves may choose to release in different ways, as
well scientific data.
So what you're going to hear me talk about today
is open data in all of those forms
and how they relate to each other,
and how they relate to social and business goals.
I do think-- and I'm certainly convinced having now
worked in this area for a couple of years--
that we're talking about a phenomenon that
has tremendous implications and tremendous impact
potentially not only for business, but also
for as for scientists, for journalists,
for consumers, and for government.
And in many ways, we're starting to see
a convergence of the civic and the commercial uses
of open data, where we're seeing some ventures that
may start as non-profits that turn out to have a sustainable
business model.
And we're seeing businesses that turn out
to be actually extremely mission driven
in their use of open data.
And you'll see many examples today.
What is open data not?
Open data is not the same as big data.
And it's not the same as open government
and it's not even really a blending
of big data and open government.
It's a different kind animal.
Big data also has many definitions.
I think the only thing everybody agrees on, at least when
I ask them, is that when you're talking
about what you mean by big data, we mean like really, really
a lot of data.
Really big data sets, which is not too surprising.
I think you can more accurately say that big data involves
data sets that are at the current limit of our ability
to analyze and use, but, of course, that
limit changes every day.
I do think there are real ways in which the quantity of data
has a qualitative impact.
In the same way that when I came here from New York,
I theoretically could have ridden a bicycle,
or even walked, and taking a plane
is really more than an accelerated way of doing that,
it's a whole different kind of travel.
So I think that big data does have that kind of impact.
But it's not philosophically different in my view anyway
from smaller data problems in the way
that open data is philosophically its own thing.
Open government is very closely related
to the concept of open data, but it's broader.
Open government includes all kinds
of government transparency.
It also includes the kinds of collaboration and things
that I just showed on the GovLab's slide.
So part of it is data related but part of it
is really other kinds of citizen engagement.
So the book does present the grand unified theory
of what is open data in a simple Venn diagram
that you will find in appendix A in the book,
and you can also find on my website
opendatanow.com in a fairly lengthy blog post.
I won't go through all this to analyze it,
but the most important thing to notice
here is that big data, open data, and open government
have several points and areas of intersection.
They are distinct, but they overlap.
And when they overlap, it gets really interesting.
The point in the middle, sector six there,
which is all three things-- these
are large public government data sets like weather, GPS,
Securities and Exchange Commission data center.
This is where we're going to see some of the highest
economic value and some of the highest potential civic value.
But it is by no means the only thing that's
important about open data, or the only kind of open data
that's important.
So that's the terrain.
Having said that, I'm going to take you very quickly,
believe it or not, through what I see as nine open data trends.
I'm going to pose three open open data
questions that I don't have the answers to,
but I think we can all probably discuss and think about.
And I'm going to describe a study that we're now
doing at the GovLab that I think will be of interest you called
the Open Data 500, that I think is going to really help advance
this field.
So let's get started.
So the first trend is liberating government data.
It's undeniable that governments at all levels
not only in the US, but in countries
around the world are now focusing on ways that they can
take data that they control and make
it available to the public as open data.
And in the US, we've seen a major step forward.
Most recently last May, when President Obama
announced the new Open Data Policy.
This policy has been called the biggest change
in how we deal with federal information
since the Freedom of Information Act the 1960s.
It's potentially that big.
There are a lot of questions about how we implement it,
which I'll talk about.
But it is a very ambitious and I think
a very right-thinking kind of program
to make government data open by default.
Meaning that unless there's a security reason or privacy
reason or something like that to keep it hidden, it's open
and anybody ought to use it.
Now one thing that's really significant
is that when the president announced this policy,
he used these words, which you could see
are very business focused and he actually chose a technology
center in Austin Texas to do it.
So the administration is really positioning open data
as a job creator and as a business driver.
That's partly why we at the GovLab
are studying it through that lens,
because we want to see to what extent that is really
a defensible proposition.
I think it is, but I think it's still a work in progress.
So the whole question of why should government
go to the trouble and the expense,
and the time of making data open, part of the answer people
think, is that it's going to have an economic benefit
as well as being a social good.
This Open Data Policy, which was announced last May,
talks about presumption of openness or open by default
making data machine readable, reusable timely.
This was based in many ways on the definitions of the Open
Knowledge Foundation and the Sunlight Foundation
developed several years ago.
One really interesting difference
is that those definitions said and the data has got to be free
and the government definition doesn't quite say that.
So there's still some room for agencies to charge for data,
but I think the direction is very
much towards free open data.
In addition to this policy, there
is now something called the DATA Act, which
may be the only thing in the known universe
that Ralph Nader and Grover Norquist actually agree on.
It is it's an extremely bipartisan movement
it stands for Digital Accountability and Transparency
act.
This is another part of open data.
So one part of open data, like the Open Data Policy,
is let's release data we have on weather, satellite data,
GPS, health data, et cetera-- data that government collects
that is useful to the public.
This is data that government has about itself.
The goal of the DATA Act is to make government spending
data more thorough, more transparent,
more usable than it's ever been by a lot.
To be able to make it go all the way down,
not just to the contractors to government,
but subcontractors, sub-subcontractors,
and to do it in a way that is really accurate.
There is a website called usaspending.gov.
It was intended to do this.
The Sunlight Foundation recently calculated
that it is inaccurate to the tune of $1.55 trillion a year.
Otherwise, it's perfect.
So the DATA Act would automate this in a way that would really
solve that kind of problem and there
is a lot of push now in Congress to pass the DATA Act, which
I think would be another major step forward.
So this is just at the federal level,
but you're seeing similar kinds of activity in cities,
in states, in the 60 countries that
now belong to the Open Government Partnership.
All of which are making similar kinds of commitments
to open data for both civic and job-creating reasons.
That's one trend.
The next trend, which comes right out of that,
is that we are actually seeing open data begin
to drive business growth in a number of ways.
And you can find examples all over the place- health,
education, transportation.
My book has a number of examples.
Somebody tweeted recently, there's
so many apps and businesses in here.
I can't even count them.
So I figured I would count them.
There are to the best of my knowledge, 183 of them.
So, happy reading.
You will find companies in all of these sectors
and they're doing some very creative things
with open data that are showing that you don't have
to own the data in a proprietary way
to make a thriving business out of it.
I'll just show you a couple of examples.
So the Climate Corporation based here in San Francisco
has become in many ways sort of the poster child
for the commercial use of open data.
I like to say I sort of knew them when.
They've gotten a fair amount of publicity over the years.
I was fortunate to have a long interview with their CEO David
Friedburg last April.
It's in the book.
And there's actually a longer podcast with him on my website.
And if you're really interested in the stuff
I would encourage you to check out the podcast,
because it's a fascinating story.
And the punch line is they were recently
bought by Monsanto for a billion dollars.
They've been profiled in "The New Yorker,"
so they've emerged as everyone's favorite example,
and I think rightly so, of what this kind of data can do.
Their story is fascinating.
They began by saying that they wanted
to sell better weather insurance.
And they quickly focused on farming and farmers
as their target.
They figured that if they could get all this data
from the National Oceanic and Atmospheric Administration,
from NASA weather data, et cetera,
and they applied really extremely smart analytics.
And the guy who started it just hired brilliant people.
I think he actually used to work at Google.
And I'm sure he hired some people from here.
But what they figured they could do
was do risk calculations that would enable them
to use to calculate the risk that they bore as an insurer
more accurately, so that they could both help farmers
and also make a business out of it.
Well what happened as they got into this
is that they found that there were open data
sources that they could use that were much better
and that would give them a much better result than the commonly
used sources.
So the first iteration of this is,
let's use data from weather stations.
Well if you're a farmer, even if you
look at every weather station the US,
it might be 30 miles away from your farm
and it's not helpful to you.
So long story short, they ended up
getting data so that they could look at a piece of farmland
roughly the size of this mid-sized auditorium or even
smaller.
They can calculate rainfall to one hundredth of an inch.
They can look at soil quality in a way
that they know exactly how the soil is going
to respond to that amount of rain.
And they're doing all of this with almost all of it
with a couple of small exceptions
is public open data that anybody any one of us
theoretically could access, but we
wouldn't know what to do with it.
And knowing what to do with it and knowing how to analyze it,
and bringing together both data analysts, and subject matter
experts, to create this new kind of tool
is how they have created a billion dollars worth of value.
They also believe that they can now
increase profitability for farmers worldwide by 20% to 30%
and help farmers understand how to deal with climate change
by changing the crops they grow and the seasons in which they
grow them.
So this is huge.
This goes from we're insurance salesmen to we're
leading the next Green Revolution.
It's a direct application of free open data
and it's a stunning demonstration
of how even data that is free and public
can be an incredibly important business driver.
A lot of people think health care
will be the next big frontier.
This is a picture Todd Park, who was the Chief Technology
Officer for Health and Human Services
and for the last couple years has
been CTO for the United States.
He runs this event in Washington every year
called The Health Datapalooza.
Datapalooza, as somebody pointed out,
could be literally defined as an all out crazy party of data
and that is pretty much what these things are.
They get about 2,000 people a year.
And we are seeing a lot of activity
in the health care center. iTriage is an example that
uses the public registry of health care providers.
So that if you're traveling and you have some symptoms,
it can immediately tell you for those symptoms are serious.
And if they are, it'll tell you how
to get to the nearest emergency room
very quickly, even if you're in a strange city.
In finance we're seeing a lot of companies like this one.
This is CapitalCube, which is now owned by Analytics Insight.
There are about 40,000 publicly traded companies
in the world for which there is enough information
to say anything intelligent about them.
These guys figured out algorithms
to analyze all 40,000 of them update their information
every single day, put their results into a prose form
that any investor can read, provide graphs
that show the relative risk and the expected return
for a given company compared to its competitors, et cetera.
Again this is not actually necessarily using a new data
source.
They're using SEC data that's been available for a while,
but they're applying a level of analytics that probably was not
possible before fairly recently.
This is becoming and we're seeing
a lot of businesses in the financial sector.
There's stuff happening in energy.
Opower is a company that's now working with utilities.
It will give you back not only your own energy usage data,
but an aggregate summary of your neighbors energy usage data,
which is apparently the most powerful motivator
to clean up your own act is the fact that you got to do
as well as your neighbors.
They're using this together with a lot
of open data about energy and energy usage and energy
efficiency to help people save energy
and ultimately, hopefully help fight climate change.
So there's many, many examples but those just
give you a sense of how this goes.
Now the interesting thing in a segue from Opower
what they are ultimately about is
helping consumers choose how they're going to use energy.
So this gets back to what I told you was the problem that got me
into this whole area of the first place-
how do you choose a cellphone plan?
Well this whole area of smart disclosure is about open data.
It's almost like a sort of subset of open data.
It's about figuring out how to get data that's
going to be useful to average people to improve their lives
and put it out there in a usable form.
Who here has read "Nudge" by Cass Sunstein and Richard
Thaler?
It's a great book if you're at all
interested in behavioral economics.
It's a perfect read and it's also interesting,
because it inspired a ton of work
in the Obama administration.
So it's very much about how understanding
collective behavior in psychology
can help you make policy decisions.
It was actually tested during the first Obama campaign.
One simple example is they found that if they planted-- not
planted, that's too strong a word-- if they promoted news
stories before every state primary election
that there was going to be huge voter turn out,
there would in fact be huge voter turnout,
because nobody wants to be left out
when there's going to be huge voter turnout.
So it became a kind of self-fulfilling prophecy,
because they knew that more voter
turnout would be helpful to them.
Actually that may have been in the election itself,
not the primaries-- correcting myself.
So anyway Cass Sunstein, who was the regulatory czar
for the Obama administration, is a big thinker in this area.
Richard Thaler, who's an economist
at the University of Chicago, is as well.
Their book "Nudge" was about how you can create behavioral cues
and use information in ways that nudge people to make choices
that are better for them.
Well, so here's an example- so while Cass was regulatory czar,
one of the things that they did is they reformed the label
that you see on cars around energy efficiency.
And you can see very clearly the most obvious change here.
So they go from the small type saying, estimated fuel cost
2000 something a year to you save $1,850 in fuel costs
over five years.
So it's a very simple example but a pretty compelling one
of how the way you present information affects what people
get from it and how they make decisions.
OK that was very much the basis of the Smart Disclosure Task
Force.
And what we set out to do was to say, how do we
use these kinds of principles at a time when most people are
getting information either on their smartphones or on the web
and where we're really trying to figure out how to give people
information that is personalized to them?
So think about how Kayak works.
I mean this is a pretty amazing tool that
allows you to go online and choose the flight that you want
to take tomorrow to wherever you want to go out of literally
thousands of flights and you can do in about 10 minutes.
So the question we start to ask is
what if there was a Kayak for everything?
What would that look like?
There was a lot of work now to try
to figure out, how do you do this for financial services?
How do you do this for health care insurance?
How do you do it for mortgages, credit cards-- all
these decisions that frankly drive most of us
completely nuts every day, either that or you just sort
of pick one and hope you're right.
Going back to cell phones as the paradigm here,
it's been calculated that Americans lose something
like $13 billion a year collectively,
because we're not using most efficient cellphone plans.
So this is real money and in many cases of like health
insurance it's also safety, and quality of care,
and quality of service.
So there have been a couple of successful experiments here.
One of the ones I like a lot is a site called greatschools.org.
This is a nonprofit.
They use state data.
They use state data to analyze the quality of public schools
and help people make those choices.
This thing is now used by more than 40%
of all K through 12 households in the US,
which is just kind of fantastic and shows you how much hunger
there is for this kind of information.
Another success-- this one is from the UK-- I always
like this because it's just sort of so bizarre so this
is a site called comparethemarket.com One night,
probably after a couple of vodkas,
somebody must have been kidding around.
They were trying on Russian accents
and somebody said it's like, comparethemeerkat.com.
Somebody then said, that is a brilliant idea.
They decided that there are a symbol should be a meerkat.
And there is now the spokes-meerkat
in the UK called Alexander Orlov, who
is the spokes-thing for comparethemarket.com.
This thing became so popular that Harrods
was going to-- yes, you can collect
all six exclusive meerkat toys.
This is like a car insurance shopping site.
This is like as if the Geico gecko
was sextuplets or something.
I don't know what it's like.
But this thing became so popular that they
were going to sell these one year one Christmas at Harrods
and the CEO apparent said we can't
do that there's going to be a run on the store.
We're just going to give them all to charity.
It has also made them a very successful company.
Now what this shows-- beyond the fact that people like fuzzy
stuffed animals and that marketers
have bizarre, but successful ideas-- what this shows
it is also possible to build a successful business doing
comparisons of car insurance, home insurance, life
insurance, energy, credit cards, travel insurance, et cetera.
Nobody has yet made this model really successful in the US,
but it is a huge consumer need.
And I think one of the things that we're
going to see in the years ahead is that smart disclosure
people are going to figure out how to really do
smart disclosure the right way and it'll be both the consumer
service and a successful business model.
AUDIENCE: [INAUDIBLE]?
JOEL GURIN: Why hasn't it made it in the US?
I think there is a couple of reasons.
I think one is that people haven't quite
found the right business model yet
that will do it in an honest way and yet also be successful.
A lot of this works off of lead generation.
Lead generation gives you the incentive
to game the system, which is unfortunate.
So that's been a bit of a problem.
I think also-- I don't actually really have a good explanation.
I think for some reason this started culturally in the UK
with smaller companies about 10 years ago
and it hasn't doesn't seem to have
caught on here in the same way.
And there are a lot of inherent challenges
in trying to do comparisons for 10 different things at once.
Like the fact the people generally
shop for any one of those only once every couple of years.
But one way or another, I think it's still
a model that ought to be applicable here,
because this is actually one and only one
of several sites in the UK that have been operating
successfully.
Anyway somebody should figure this out.
I think it's an interesting challenge.
Next trend- we're seeing a lot of use of open data
in an investment context, which I think can be good for society
as well.
This is a British company-- there's
a lot of work going on in London-- that
is making open data available about small to medium size
enterprises.
Private companies that have had trouble attracting investment
because the investors don't want to go
to all the trouble of analyzing whether or not
they're a good risk.
These guys are providing enough information
that they believe they can get about $250 billion
more dollars invested in these companies
by simply providing the information that lets investors
invest with confidence.
So that's a good thing for business.
But a lot of the potential I think
is in what used to be called corporate responsibility--
what's now being called environmental social governance
measures, because we're seeing more and more investors who
consider good sustainable practices to be
a sign of good corporate governance.
So for example, the Carbon Disclosure Project
collects data on carbon footprint
from most of the major companies from Fortune
500 and other companies.
They represent institutional investors
who collectively have about $87 trillion to invest.
So we're seeing some real interest from that community.
We're seeing the same kind of thing being applied
to the consumer field, particularly
by a company in San Francisco called GoodGuide, which
it provides a lot of information to consumers
about the environmental impact of the products and services
they buy.
Much of this based on EPA and other open data.
Companies are now becoming more and more interested in this
because they want to see if they have
a good profile the consumers will like.
And then finally, the Securities and Exchange Commission
has begun to demand that companies that report to them
include information on things like whether or not
they use conflict minerals, which are minerals that
are mined under pretty horrible conditions in the Republic
of Congo.
That kind of thing, which happen under Dodd-Frank
could be the beginning of the SEC demanding more and more
environmental social governance measures.
If that were to happen, we could see some real changes
in corporate practices.
So I think this is a case where open data, because it's
of interest not only to citizens, but also
to the investor community, can have a lot of leverage
in improving corporate behavior.
We're seeing open data shape reputation and brand
in some powerful ways.
Part of this is public complaints
and what happens when you make complaints about a company
public.
So these two people founded a company
called PublikDemand, which takes complaints from consumers,
amplifies them through social media to an extent
that a company like AT&T or United Airlines
has to immediately pay attention.
And in many cases they've gotten very rapid solutions
to problems that otherwise would have gone back and forth
with customer service for months.
Well this is a strategy that regulatory agencies are also
following.
The Consumer Financial Protection Bureau in particular
has made its complaint database public.
And banks are now paying much more attention
to customer complaints and customer satisfaction
than they ever would have because of this open data.
Both "Forbes" and "American Banker"
have written about how this is really changing the banking
industry, because they have to listen collectively
to consumers whereas they could ignore people one at a time.
The next stage of this, I think, is analyzing social media.
Since we are now at this stage of 2 billion tweets a week
which is-- I don't know about you--
I find that somewhat terrifying.
But not only through the kinds of reviews and comments
people do on Google, but these other sites as well.
We're seeing a whole huge amount of social media commentary
and you would think that if you could actually figure out
how to analyze this and do something with it,
you would have a very powerful form of open data that
has huge business relevance.
Well one company that is working on this
is reputation.com, which is in the business of helping people
improve their online reputations mostly by promoting
more positive and genuinely positive
feelings about what they have to say.
But there is a whole other level of this--
of sentiment analysis-- which many of you
may be familiar with.
So I always like to ask how many people
know who the woman on the left is?
How many people know who the guy on the right is?
OK, at least a couple generally in every tech audience
more people recognize Alan Turing
than recognize Jane Austen but that's who they are.
And if Jane Austen and Alan Turing had a love child,
it would be sentiment analysis.
Because sentiment analysis essentially
is this technique of doing text analysis to figure out
what people feel about brands, celebrities,
TV shows, specific products, specific services, et cetera.
There is an annual conference now held in New York-- well,
I think it's usually New York-- every March
where people get together talk about this stuff.
It's a chapter in my book.
I've also done a podcast with a guy named
Seth Grimes, who's a guru in this area.
That's on my website.
It's absolutely fascinating.
It's not yet a mature technology,
but ultimately you can see where this is going.
This is going towards treating all of social media
as an analyzable, quantifiable form of open data
that can have a lot of implications in a lot of areas.
Personal data is a specific kind of open data
in that this is about making data about my medical records
available to me, or like Opower, my energy
usage available to me.
It doesn't really fit the classic definition
of open data.
It's not like available to everybody for free,
but it's a very important part of the ecosystem.
Partly because opening data to me is a different kind of thing
that me not being able to access my own data.
And also because in many applications of big open data,
having the ability to match it up with personal data
is an important part of the puzzle.
This is actually the diagram from a report by the World
Economic Forum.
They've now done a couple of reports
on unlocking the value of personal data.
The basic idea that people are talking about
is what if you could establish a data vault.
So I'm seeing this as probably a concept that many of you
thought about a lot.
It's been kicking around for a while.
It may or may not be getting to a point the applicability
or maturity.
There are companies like reputation.com,
personal.com in DC, and others that are looking at this.
But the basic idea is, if you had
access to your personal data, if you can hold it securely,
and if you could then release it selectively to other people
or to marketers, what would happen?
Well one model, which is being called vendor relationship
marketing by Doc Searls who talks about it in his book,
"The Intention Economy," one model
is that instead of marketers targeting you, you target them.
It is worth about $2,000 for a Mercedes Benz dealer
to get a qualified buyer on the lot based on the probability
that they're going to buy a car.
So it might be worth a couple dollars for that person
to find you if you wanted to release demographic
or whatever kind of information that
made you look like a good customer
and actually pay you to make a visit.
That's a kind of simple form, but some
of the people working in this area
think there's a lot economic potential there.
I think it's still hypothetical, but at least
points towards the greater degree of consumer
control over how we are all marketed to.
On the other end of the spectrum,
there is potentially tremendous public value
in sharing personal data.
This is this app PulsePoint, which is essentially
if you are a person who knows CPR, you tell them that.
If there's somebody who is having cardiac arrest,
they then immediately send a message
to everybody nearby who knows CPR.
They can get to them faster than an ambulance can.
They can potentially save a life.
So this is the use of personal data
that I'm not sure anybody would have thought
of a couple years ago, but it's the kind of thing
that when you start thinking of personal data
as a form of open data on a voluntary basis some really
interesting things can happen.
They talk about themselves as enabling citizen superheroes
and I think that's actually pretty accurate.
Open data and research- this is another area where
I think we're going to see potentially huge benefits.
We're seeing more and more interest and more and more
pressure for particularly biomedical, but potentially
other kinds of scientific research to be more open.
Now a couple things are happening here.
One is the open access movement with which, of course,
Aaron Swartz was very involved in promoting
and very tragically in the end.
But that's very much about one state,
as in a published journal, we shouldn't all
have to pay thousands of dollars to get at that data
in order to get at that report.
And the federal government recently
announced just a couple weeks ago that about half
of all federally-funded research will now
have to be made publicly available for free
online within a year of its publication in journal.
That's sort of after the fact.
What gets even more interesting is
data sharing while the work is in progress.
So a lot of this is coming from patients and from funders.
Kathy Giusti was a corporate CEO in her 30's when
she discovered she had multiple myeloma.
She quickly discovered that there was very little research
being done.
She started a foundation to fund that research.
And a condition was if you take their money,
you have to make your data openly available
as you make new discoveries.
This is in many ways the model that the Human Genome Project
worked on very successfully.
It's now being followed in Alzheimer's research
and Parkinson's research and in other ways as well.
It is potentially a transformational change
in how we do science.
If the business models are worked out and if
there's enough cooperation from scientists
and from drug companies and others
to really make this the norm.
We're also seeing a lot of very successful experiments
in crowd-sourcing science.
One of the most famous was done at University of Washington
a couple years ago.
They had been working for a decade trying
to solve protein structure for protein related to the AIDS
virus.
They decided to put it on the site Foldit
and asked gamers to solve it.
Gamers solved it within a couple weeks.
They published in "Nature" and they thanked the gamers
publicly.
This was, I think, eye-opening for a lot of people.
Another example, any of you know Galaxy Zoo or Zooniverse?
This is one of the great citizen science projects
and it's really a model for many of them.
This thing got started in Oxford because some poor PhD student
had to look at images of the structure of spiral galaxies,
which apparently computers cannot assess very well.
And he had hundreds of thousands of these to look at.
He looked at 50,000 in a week and he
said there's got to be a better way.
They decided that the better way was posting these images
online, inviting just ordinary people look at them.
They can do it with a high degree of accuracy.
They've now taken on other scientific projects,
like cancer cells as you see here.
They have tapped 800,000 volunteers
to help them do skilled human work
in the interest of science.
And then finally SkyTruth is applying the same kind of thing
to the environment.
This is a nonprofit in the Washington area
that is now using crowd sourcing to look at things like maps
of areas of Pennsylvania where fracking is going on
and look at signs that fracking is damaging the environment.
So this becomes environmental protection
through open data from the satellites
and crowd sourcing applied to that open data.
Data driven cities is a huge movement right now.
Right at NYU we have the Center for Urban Science and Progress,
which is doing a lot of work in this area.
The idea is to put sensors all over cities, to instrument
cities, to see what can be learned,
to improve operations, public health, emergency management,
all kinds of things.
You're also seeing a lot of use of data for accountability
in cities like Chicago.
Palo Alto has been a leader here.
And there are a couple of interesting things
to come out of this.
One is applications like NextBus,
which is now over the country, where city traffic data can
be used to help you figure out when your next bus is coming
so you're not waiting endlessly in the rain.
To things like this experiment in Washington
where they have actually solicited public input
about how the different government agencies are doing.
So they've actually been able to grade government agencies
on the basis of both survey data that they collect
and sentiment analysis of what people
are saying on social media.
When they first did this, four out of five agencies
got a c minus one got a c plus.
They were not very happy with the mayor for doing this,
but they have gone public with it.
It's a really interesting feedback loop
and over time, the grades have gone up.
So now the last trend and this is
one where we're really intensely focused at NYU--
is trying to figure out when you look at all of this together,
what is open data worth?
And this is an important question
because it is not a slam dunk or particularly
easy to take data that has traditionally been siloed
and open it to the public.
So there have been a number of studies on this.
The most recent was McKinsey study last October
that says that open data is worth $3 trillion a year
worldwide.
That's by far the highest estimate anybody has come up
with as you can see from some of the other ones on the screen.
But generally the estimates run pretty high.
So it's a very interesting challenge
we all think there's potential there,
but what we have done at the GovLab at NYU,
is we've set out to do this thing called the Open Data
500, which is about figuring out exactly where the value is.
Beginning in the business sector,
but ultimately wanting to look at the nonprofit sector
as well.
We're looking at the US based companies.
We have actually contacted more than 500 of them.
We're in the process of finalizing the list.
If this interests you, I would urge
you to please go to opendata500.com because we
are really seeking public comment on everything
from individual companies to our methodology
to whole goals of the study.
Or you can tweet to hashtag #OD500 if you have suggestions
for us.
We have this now on a website as a work in progress,
where you can filter by state or by category
and see where some of these open data companies are.
Not surprisingly the greatest numbers are in California,
but I'm glad to say, as a New Yorker,
that New York is not far behind.
And we're beginning to see some interesting patterns here
that I think are really going to be meaningful.
So one of those patterns, which I'll show you in a second,
helps answer some of the open questions about open data.
So having shown you all these trends,
and shown you all the stuff that's
happening that I talked about my book and the website
and my other work, there's still a lot we don't know.
And I would say there's three major questions that where
now all looking to answer.
So the first one is, OK if we think open data has value,
which sectors are the most promising?
Well from the Open Data 500-- even though this
is preliminary, I can't stress that enough because this is not
a final list, et cetera-- but we're
starting to see some hints of that.
So the first tier, the company's the sectors that
have the most companies in them are
what we're calling data slash technology
and finance and investment.
Finance and investment probably because there's
so much interest, and because SEC data
and other kinds of business data has been out there
for a long time and is a very rich source.
Data technology because there is a whole huge emerging
sector in helping figure out how to take
really unwieldy government data sets and turn them
into usable open data.
So this is companies like Socrata, Junar, OpenGov here
in Palo Alto, or nearby, and many others.
That their business is making open data business-friendly.
And one of the interesting questions
is whether this is something that's
going to be around forever, which I think it probably will,
as we were talking about a little bit before.
Or how much this sector may change as governments
get better at releasing open data.
Next we have health care-- which I
think is emerging-- transportation, energy,
and then the third tier, where only about a couple
percent the companies we have are in each of these areas.
This includes a number things.
Many of which are really quite significant like education,
scientific research, environment, food
and agriculture, the climate corporation for example,
is somewhere in this tier.
So we just have a couple of initial observations
and caveats about this.
One is that sectors that don't have a lot of companies
in them, like weather and agriculture,
may have a climate corporation in there
or may have a very significant company.
So simply number of companies per sector
doesn't necessarily tell you the importance of the sector,
but it does tell you at least where
a lot of the entrepreneurial activity is.
We still have to do more work we are getting information
on the number of employees per company, which
is going to be an important metric.
We're trying to get information on financial metrics
and as I said the data technology category.
I was very interested to see that so high up
and I think it says something about the sorry
state of government data.
Which leads directly to question number
two- how do we improve the open data ecosystem?
Having worked in the federal government for a while
and talked to people a lot agencies,
I can tell you a lot of those government data sets are
a mess and the people who run them know it.
And they are trying to fix it, but it's not
going to happen overnight.
So there are a couple things that are happening.
On a city level, OpenGov, which is a company right near here,
has developed what they like to call kind of a Sim City
for actual cities.
So they have this thing you can see in the upper right,
where they can take budget and other city data from any city,
put on a platform that makes it usable,
and that also makes a comparable to other cities' data.
And they can then make town meetings much more productive.
They can tell you why Palo Alto has a certain rate a police
overtime and how that compares to San Mateo
and they can learn things about city governance from that.
So this is one of those data slash technology companies
that's beginning to make the data more useful.
Another one which is a couple blocks from us at NYU
is called Enigma.
They won TechCrunch Disrupt in New York last May.
And what was significant about that
was, not just what they got this nice large $50,000 check,
but that I think there was a recognition of how
important data companies are.
Their whole thing is taking really unwieldy government
federal data sets and making them
usable on a common platform and interoperable ways.
And they're getting a ton of attention
right now because this is something
that everybody who has ever worked with federal data
has wanted.
And the reason for that is that right now federal data
looks something like this.
For those of you who have seen "Raiders of the Lost Ark."
The pathetic thing about this is not only
the federal data is this bad, but that this is the metaphor
that everybody in the federal government who works with data
uses to describe the state of federal data.
It is that bad and we know it.
There's good stuff in there somewhere,
but good luck finding it.
A lot of the work going on in government and in third parties
like Enigma and like OpenGov is to make this stuff more useful.
And the open question is how can we really make this work?
And this is an area where I certainly
think certainly Google has done a lot
and has a huge role to play.
I think also part of this is going
to be what I'm calling demand-driven data disclosure.
The way it's worked in the past, government agencies
have largely released open data when they've identified
data set they think are of interest,
or where they're just doing it to compare
to comply with the government mandate.
One of the things we want to get out of the Open Data 500
is to create a kind of round table, where data users can
give much more ongoing feedback to data holders
in the government agency.
We think this is going to really improve the quality of data,
the availability of data, and the ecosystem as a whole.
And then finally, how can developing countries
use open data?
This is a huge question that the World Bank among others
is putting a lot of effort into.
I'm going to be on a panel for them
the next Wednesday afternoon in DC.
And there's a couple of areas here
one is fighting corruption.
This is the website that has I think the best
name of any website I've seen it is called ipaidabribe.com.
This is a website in India, where you can go
and through crowd sourcing report if you
had to pay bribe in a way that makes corruption transparent
and ultimately decreases corruption.
But we're also looking to go beyond transparency
to economic development and a lot of people
are asking whether developing countries can use all the data
as a business resource in the same way
that we're seeing in the US, the UK, France and places
like that.
So one of my colleagues of the World Bank, Prasanna Lal Dass,
recently did a very good blog post
summarizing some of the things that are needed.
As you can see, it's going to take some work,
but the rewards may be great.
And I'm seeing a lot of interest right now
in figuring out how to make that work and can we make it work.
So that's the open data universe at least
as I've come to see it through the work I've done here.
I would recommend you to a couple
of sources for more information.
One is at thegovlab.org, where you can see our wiki,
subscribe to our digest.
You can also sign up outside to get a digest subscription.
It comes out every week.
It's a curated collection of material in this area.
There's opendatanow.com, where I report
on this stuff on a regular basis,
largely with interviews with people in the field,
pod casts when I can do them, hopefully
as a resource to the community.
And there is of course this book now available,
which I see many of you have already purchased thank you
and which I hope you find useful in one way or another.
So we have a couple of minutes for questions
and thank you very much.
Thank you.
AUDIENCE: Thank you very much.
I have two questions.
one is, if you develop your business using open data,
what are the caveats, what is the license on all this data.
And the second question, suppose I want some data.
Where would I find it?
For example I would like to see some data
on education broken down by gender.
Where would I go?
Where do I even start?
JOEL GURIN: OK so two questions, one
is in terms of the license use data.
It is only open data if it's released
under an open license that makes it usable
by anybody, and reusable, and re-publishable.
So by definition, that's pretty much built
in to the Open Data Policy, at least of the US government,
more and more governments recognize that.
In terms of where you go, if you're
looking for federal data, you should go to data.gov,
this is the central repository of federal data.
It was originally built in a way that I think a lot of people
found not as user-friendly, as they want it.
They've just relaunched it.
It's much better.
It's getting better all the time.
But that's where you would find data like that segmented
by agency and by area of interest.
As a start, in any case.
AUDIENCE: Thank you.
Obviously, you logically focused on the US.
Is there any other nations that you'd point to as best
in class who are really leading the field in terms
of leveraging the infrastructures developed
in their governments or in their societies.
JOEL GURIN: Yes the UK is really the other world leader
and in many ways they're doing things
in a more advanced way than the US.
For example, their equivalent of data.gov
is all done with link data is really very
beautifully and very well designed.
So they are in some ways ahead of us
in some ways learning from us.
They also have an institute there called the Open Data
Institute that is funded partly by the government
of by other sources as well that's
doing a ton of work really global leadership in this area.
Beyond that, we're seeing a lot of interest
all over the world on every continent.
And I think what's happening is that different countries as I
mentioned there now 60 countries or so in the Open Government
Partnership, which is committed to these open government
principles, part of which is open government data.
And it's rapidly emerging as an international movement.
I think different countries depending
on the stage of development will figure out
what is the most important and most appropriate
form of the data for them to release.
AUDIENCE: So all the applications
of open data that you mentioned are all vertical.
They're trying to solve a particular problem.
Do you see a need or an opportunity
for more horizontal plays of products
that could be usable by many applications that
use open data?
JOEL GURIN: Well I think probably
the best examples of those are these data technology companies
like Enigma or OpenGov, because what they're essentially
trying to do is to make data of all kinds more usable.
And I think what they're hoping to do
is to make possible the kinds of mash-ups or interoperability
of data that can make a lot of those more complex applications
possible.
Right now, at least if you're working with US federal data,
it's very difficult.
We did a project at the GovLab simply
to try to mash up EPA and OSHA data about factories
and facilities that both agencies regulate.
You would think this was dead easy.
It's not.
I mean even on that basic level, it
takes work to make this stuff these data sets work
and play nicely together.
So companies that are making that happen, I think,
are definitely taking that kind of broad horizontal view
whether they're going to be helpful to a lot
of other companies.
Yes?
AUDIENCE: Yes.
Do have any comments about Aaron Swartz
who tried to liberate some common government data
but got sued by the government.
JOEL GURIN: Yeah I write about Aaron and that in my book.
I think everybody pretty much recognizes now
that MIT was not a good path there, to say the least.
And it gets into some complexities,
but I think the short answer is what Aaron was really
fighting for was open access and access to material that
has already been published in a way that the public can use it.
That's now just become federal government policy
for about half as I said of the research
that the federal government funds
so I think there's a greater and greater recognition that he was
right about that and that we should start getting on board.
AUDIENCE: How do you think about accuracy, or even just not
necessarily accuracy, but knowing
what's in the data set, like keeping track of the metadata,
like what's actually being counted.
Who was excluded, who wasn't, how data was collected,
and that kind of information that
can change what the data means?
JOEL GURIN: Yeah that's a great question.
I would say right now that's very hard part of the Open Data
Policy is to actually publicly release information
about the quality of data.
I think this is going to be one of the parts of the policy
that federal agencies absolutely hate the most.
But there are some really interesting examples
of agencies facing up to this problem
and dealing with it so one great example is
USAID-- international development--
knew that they had lousy geospatial data
on the organizations they were giving grants to.
They put on a hack-a-thon, but a very careful one.
They found people who are sort of geospatial hackers
in the Washington area.
They invited about 100 people in.
They said, we're going to give you special access to our data.
We want you to fix it.
We'll give you all the weekend.
They were done in about 15 or 16 hours.
So this idea of kind of crowd-sourcing quality control
is one that a couple of government agencies
have become interested in.
But simply knowing is very hard.
And that's one of the reasons that I
think establishing feedback loops-- really
good feedback loops between data users and the agencies
that hold the data-- is going to be critical next step.
So that we can ask those questions government agencies
We can see what the response is.
And where there is a really serious flaw
in a really important data set, they
can prioritize that as something that stakeholders need fixed.
AUDIENCE: And so you showed a lot of great examples.
I was wondering if you think that we can leverage
mobile in a specific way as opposed to the desktop sites.
JOEL GURIN: Yes.
I tend to show desktops because they look better on PowerPoint,
but absolutely most of these things
that I showed either are mobile apps
or could be mobile apps as well.
I think the one caveat on mobile apps
is that we are a little bit risk of app mania with open data.
There have been all these hack-a-thons
of apps for this or apps for that, which is great,
but I think there are probably some limitations in what
is easy to do in that mobile environment.
And there are some more sophisticated things
that can be done if you, I believe, look more broadly.
But definitely pretty much anything
that I showed you has a mobile application attached to it
FUMI YAMAZAKI: OK.
Thank you very much.
I think we're running out of time
but Joel will be staying here for us.
Thank you very much.
JOEL GURIN: Thanks so much for coming.
And thank you for the work you're all doing.