Google I/O 2013 - WebM と新しい VP9 オープン・ビデオ・コーデック (Google I/O 2013 - WebM and the New VP9 Open Video Codec)

字幕表動画を再生する

MATT FROST: Welcome to the first session on WebM and the
New VP9 Open Video Codec.
We figured that there's no way to really add a little
excitement to a presentation than to change it at the last
minute, and so what we've spent this morning doing is
encoding some VP9 video and H.264 video and putting
together a side by side demonstration just to give you
a taste of what we're working on.
So what you're going to see is a video.
The video is going to be the same on either side.
It's going to be VP9, the new codec on the left,
H.264 on the right.
And H.264, we used the X264 open video encoder, which is
commonly regarded as the best encoder out there.
We used the highest possible settings.
So we've done everything we can to favor H.264 here.
All of this is at the same data rate, so both of the
videos are going to be at the same data rate.
And the bit rate varies.
In some cases, we're using 500K.
In other cases, we've dropped the bit rate down to bit rates
that are actually banned by certain UN conventions for the
compression of HD video.
And so with that, I think that's everything, Ronald?
RONALD BULTJE: Yes.
So like Matt said, what you're looking at here is shots that
we just took this morning.
We've encoded those in just a couple of hours and basically,
what you're looking at here, on the left, VP9 and on the
right, H.264, is what an amazing job we can actually do
at video compression if we're using the very latest
technologies.
MATT FROST: So you can see the blockiness on the right.
On some of this, it's a lot more evident than others, and
especially evident, if you want afterwards to come up and
take a look at this running on the screen,
we can freeze frames.
But you see there on the right especially, all this
blockiness and how much it clears up as it
moves into VP9 territory.
RONALD BULTJE: And a point here really is that for high
definition video, H.264 can do a reasonable job, but we can
do a lot better than that.
And so having said that, let's actually get started on the
presentation.
MATT FROST: So the way that we're going to handle this
presentation is I'm going to do a quick introduction on why
we care about open video, both why does Google--
which has historically been involved with developing
applications around video--
has gotten down deeply into actually helping work on these
next generation compression technologies.
After we talk about that and why, in general, improving
video compression is good for everybody, I'm going to turn
it over to Ronald for really the meat of this presentation,
which will be to show you some more demonstrations, to talk a
little bit about how we measure video quality, talk
about some of the techniques that we're exploiting to
really make this dramatic improvement in compression.
And then finally, after you've seen this, and I hope that
you've started to get a little excited about what this
technology can do for you, we'll go and talk about the
last stages, how we're going to wrap up this project and
how we're going to get these tools into your hands as
quickly as possible.
So to start off with, just taking a quick look at how
Google got into video.
Video at Google started in the same way that so many big
projects at Google start, as an experiment.
And we launched these efforts with just a single full time
engineer and a number of engineers working 20% of their
time on video, really focusing on video-related data.
And then over the last 10 years, obviously, video at
Google has exploded, not only with YouTube but with Google
Talk, Hangouts, lots of applications where you
wouldn't necessarily think of video as playing a core role,
like Chromoting, which is Chrome Remote Desktopping.
But if you look at the really motivating factors for getting
into video compression, there are a couple that
are really of note.
One, of course, is the acquisition of YouTube.
And with the acquisition of YouTube, we all of a sudden
started to focus very heavily on both improving the
experience for users, improving video quality, but
also about the costs associated with all aspects of
running a service like YouTube.
There are costs associated with ingest, transcode of
video formats, storage of multiple different formats,
and then distribution of the video, both to caches and to
the edge, and ultimately to users.
The second was the move from HTML4 to HTML5, which came at
the same time, pretty much, as our launch of Chrome.
And of course, in HTML4, although to the user, it
appeared that video could be supported in a browser, in
fact, video was supported through runtimes and plug-ins.
With HTML5, video becomes a native part of the browser.
And so with the move towards HTML5, we see it filtering
through the addition of the video tag in Chrome and the
launch of HTML5 video for YouTube.
So these are the two factors--
the focus on quality and reducing cost with YouTube,
the need to build a high quality codec into Chrome and
other browsers for the video tag--
that sparked the acquisition in 2010 of On2 Technologies,
the company that I came from and many members of the WebM
team came from, and the launch of the WebM project.
The WebM project is an effort to develop a high quality,
open alternative for web video.
We're very focused on web video, not on video for
Blu-ray discs, not on video for cable television, but
about solving the problems that we find in web video.
In addition, we're very focused on having an open
standard because we believe that the web has evolved as
quickly is it has because it is based on open technologies.
And clearly, multimedia communication has become such
a core part of how we communicate on the web that we
need open technologies that are rapidly evolving to allow
us to keep pace and to make sure that we can develop the
next generation of killer video applications.
We wanted something simple as well.
So we used the VP8 Open Codec, the Vorbis Open Audio Codec,
which was a long existing open audio codec, and then them the
Matroska File Wrapper.
With the launch of VP9 in a matter of months, we're going
to be adding the VP9 Video Codec as well as the brand new
Opus Audio Codec, which is another open audio codec, very
performant and high quality.
So since our launch, obviously, web video has
continued to grow.
And if we just look at what we know very well, which is
YouTube, YouTube has grown to be a global scale video
platform capable of serving video across the globe to
these myriad connected video enabled devices
that we're all using.
It supports a billion monthly users, and those users are
looking at video four billion times a day for a total of six
billion plus hours of video viewed monthly.
Just to think about that number, that is an hour of
video for every person on the planet consumed on YouTube.
And on the creation side, we're seeing
exactly the same trends.
72 hours of video is uploaded per minute, and that video is
increasingly becoming HD video.
So if you look at the graph on the right, blue is 360p
standard definition video, which is slowly declining, but
quickly being matched by uploads of HD video.
And the key here of great importance is that HD video is
obviously more complex.
There's more data for a given HD video than there is for--
unless, of course, you're encoding it in VP9--
than there is for a standard resolution video.
In addition, I think we can all agree that the better the
video is, the higher the resolution, the more
watchable it is.
And then finally, the other trend that's driving both
creation and consumption is the increase in mobile devices
and the move towards 4G networks.
So even this morning, there was an article when I woke up
and was checking my email saying that YouTube video
accounts for 25% of all downstream
web traffic in Europe.
And I think BitTorrent accounted for 13%.
So there alone, between just two web video services, we're
looking at close to 40% of all web data in Europe being video
related data.
And that accords with what we see from the latest Cisco
forecasts, for instance, which is that consumer web video is
going to be close to 90% of all consumer data on the web
within the next three years.
So it's remarkably encouraging to see the growth in video,
but it also represents a real challenge.
Of course, the good news is that we have a technology that
is up to this challenge, and that is VP9.
With next generation video codecs, with the codecs as
good as VP9, we can effectively significantly
increase the size of the internet and we can
significantly increase the speed of the internet.
So obviously, if you're taking VP9--
which, as Ronald will say, halves the bit rate you need
for the very best H.264 to deliver a
given quality video--
you're going to be able to speed the downloaded of a
download and play a video, you're going to be able to
speed, obviously, the buffering of these videos.
So we have the tools to effectively dramatically
increase the size of the internet.
But of course in doing that, in improving the video
experience, in improving the ability to upload video
quickly, we're going to just create the conditions for even
more consumption of video.
And so it's not going to be enough for us to rest on our
laurels with VP9.
We're going to have to turn to VP9 and keep on doing it, keep
on pushing the boundaries of what we're capable of with
video compression.
So with that, I'm going to turn it over to Ronald to show
you some really remarkable demonstrations of this new
technology.
RONALD BULTJE: Thank you.
So to get started, I just briefly want to say some words
about video quality.
So how do we measure quality?
Well, the most typical way to measure quality is to just
look at it, because at the end of the day, the only thing
that we care about is that the video that you're looking at
looks great to your eyes.
But that's, of course, not all there is to it because as
we're developing a new video codec, we cannot spend our
whole day just watching YouTube videos over and over
and over again.
That would be fun, though.
So in addition to visually analyzing and inspecting
video, we're also using metrics.
The most popular metric in the field for measuring video
quality is called PSNR.
It stands for Peak Square Noise Ratio.
And the graph that you're looking at here on the left is
a typical representation of PSNR on the vertical axis and
video bit rate on the horizontal axis to give you
some sort of a feeling of how those two relate.
So the obvious thing to note here is that as you increase
the bit rate, the video quality, as measured by this
metric, increases.
So at the end of the day what that means is that it doesn't
really matter what code you use, as long as you've
infinite bandwidth, you can accomplish any quality.
However, our goal is to make it easier and faster and
simpler to stream video.
So how does PSNR actually compare to visual quality?
So for that, there's a sample clip.
So what you're looking here is a very high penalty shot of
the New York skyline.
I believe that this is the Empire State Building.
And this clip has a lot of detailed textures all across.
So what we've done here is that we've encoded it at
various bit rates, and then every couple of seconds, we're
dropping the bit rate and the metric quality of the video
will slowly decrease.
So this is 45 dB, and what you're seeing slowly at 30 dB
is that some of the detail, or actually a lot of the detail,
in the backgrounds of the buildings just completely
disappears.
And that was the case at 35 dB already also.
As you go to 25 dB, you can see-- we can go really low in
quality, but you do not want to watch this.
Here's a different scene.
Same thing, we start with the original 45 dB.
40 dB looks pretty good.
35 dB starts having a lot of artifacts, and then 30 and 25
are essentially unwatchable.
So what does that mean for video quality?
Well, the typical target quality for high definition
video on the internet lies rounds 40 dB.
You were just looking at the video, and a 40 dB looked
really quite good.
So if you go to YouTube and you try to stream a 720p
video, that's actually about the quality that you will get.
In terms of bit rate, what you should expect to get is a
couple of megabits a second.
For this particular clip, that's one to two megabits a
second, but that's very source material dependent.
So what we've done, then, is we have taken, I think, about
1,000 YouTube CCL licensed uploads, just randomly
selected from whatever users give us, and we've then taken
out particular material that we're not really interested
in, such as stills or video clips that contain garbage
video content.
And then we were left with, I think, about 700 CCL licensed
YouTube uploads, and we've encoded those at various bit
rates-- so at various quality settings--
with our VP9 Video Codec or with H.264 using the X264
encoder at the very best settings that we are aware of.
Then for each of these clips, we've taken the left half of
the resulting compressed file and the right half of the 264
one and we've stitched those back together, and then you
essentially get what you're looking at here.
So left here is VP9, right is 264, and those are at about
the same bit rate.
You will see graphs here on the left and on the right, and
those are actually the effective bit rate for this
particular video clip.
And as you can see, it starts being about equal.
Now, you saw it just jumping up, and that's because we're
gradually increasing the bit rate to allow the 264 encoder
to catch up in quality.
And as you can see, it slowly, slowly starts looking a little
bit better.
And at this point, I would say that it looks about equal on
the left and on the right.
But if you look at the bit rate graphs, you can basically
see that we're spending about two and a half times the bit
rate on a 264 file versus the VP9 file.
So those are the compression savings that you can get if
you do same quality encodings but you use
VP9 instead of 264.
So what you're looking at here is a comparative graph for the
clip that you were just looking at.
The blue line is the 264 encoded version and the red
line is the VP9 encoded version.
And as I said in the beginning, vertical axis is
PSNR as a metric of quality, and the
horizontal axis is bit rate.
So the way that you compare these is that you can pick any
point from the red line--
or from the blue line, for that matter--
and then you can do two things.
Either you can draw a vertical line and find the matching
point on a blue line that matches the points on the red
line that you're looking for and look at what the
difference in quality is.
But what we usually do is we do it the other way around.
So we're drawing a horizontal line for the point on the red
graph, and we're finding the point that matches the
horizontal line on the blue.
And what you're looking at here is that for the point
that we were just looking at, that is, a quality metric
point of about 37.1 dB, the VP9 version takes an average
of 328 kilobits a second to reach that quality, and for
H.264, you need to go up to essentially 800 kilobits a
second to get exactly the same quality.
So what that means is, again, the metrics tell us you can
get a two and a half times lower bit rate and effectively
get the same quality by using VP9 instead of 264.
If you look to the higher end of the graph, you will see
that the differences in quality for the same bit rates
might go slightly down, but that's basically just because
at the higher end, there's a diminishing
returns for bit rate.
So if you look at the high ends of both of those graphs
and you do the horizontal line comparison, so what is the
different bit rate that accomplishes the same quality?
You will see that it about comes down to 2x over the
whole graph.
So let's look at the difference video because I
could just be cheating you with this one video and we
could have optimized our codec for this one video.
So what you're looking at here is, again, the same thing, VP9
on the left, 264 on the right, live bit rate graphs and we
start at the same bit rate.
Then as we do that, we're slowly increasing the bit rate
for the 264 portion video so that it can actually catch up
in quality.
And what you're looking at is that on the right, the floor
is pulsing a lot.
You can actually see, if you focus on the pants of little
boy here or on the plastic box, that it's very noisy.
But eventually, it catches up in quality.
Guess what happened to the bit rate?
It's almost 3x for this particular video.
So here is the [INAUDIBLE] graph for the material that we
were just looking at.
The red line is VP9, the blue line is H.264.
And if we do the same quality different bit rate comparison
at the point that we were just looking at, which is about
38.6 dB, for VP9, you arrive at about 200 kilobits a
second, and for H.264, you need to interpolate between
two points because we don't have an exact match, and it
ends up being around 550 kilobits a second.
So almost 3x more bit rates to accomplish the same quality,
just because you can use VP9 to save this.
So we've done this over many, many clips.
I told you we had about 700 clips that we tested this on
at various bit rates and various quality settings, and
overall, you can save 50% bandwidth by encoding your
videos in VP9 instead of H.264 at the very best settings that
we are aware of.
So how did we do this?
So let's look a little bit at the techniques that we're
using to actually get to this kind of compression
efficiency.
So a typical video sequence consists of a series of video
frames, and then each of these video frames
consist of square blocks.
So for current generation video codecs, like H.264,
these blocks have a size of a maximum 16 by 16 pixels.
We've blown this up a lot.
We have currently gone up to 64 by 64 pixels for each
block, and then at that point, we introduce a
partitioning step.
And in this partitioning step, we allow you to do a vertical
or horizontal partitioning, a four-way split, or no
partitioning at all, resulting in different size sub-blocks.
If you do a four-way split and you have four 32 by 32 blocks,
then for each of these blocks, you go through the same
process again of horizontal, vertical split, four-way
split, or no split at all.
If you do the four-way split, you get down to 16 by 16
pixels, do the same thing again to get to eight by
eight, and eventually four by four pixels.
So what this partitioning step allows you to do is to break
up the video in such a way that it's optimized for your
particular content.
Stuff that has a very stable motion field can use very
large blocks, whereas video content where things are
moving all across all the time, you can go to very small
video blocks.
So what do we you do after that?
So after this partitioning step, we're usually doing
motion vector coding, and basically what that does is
that you pick a reference frame, and you pick a motion
vector, and then the block of that particular size that you
selected in your partitioning step will be coded using a
motion vector pointing in one of the previously coded
reference frames.
These reference frames in VP8 were usually frames that had
previously been encoded, and were therefore temporarily
before the current frame.
What we've added in VP9 is that we have multi-level alt
reference frames, and what that allows you to do is
encode the video sequence in any frame order, and then you
can use any future frame as a reference frame for a frame
that you encode in order, decide to encode after that.
So for this series of frames in the
left, this is six frames.
I could, for example, choose the first thing encode frame
one, then frame six, and then frame three using both a
future as well as a past reference.
And then, now that I have encoded three, I can encode
one and two really efficiently because they have a very
proximate future and past reference.
After I've encoded two and three, I go to five, which has
four and six as close neighbors.
And so that allows for very temporally close reference
frames to be used as a predictor of contents in the
current block.
So once you have a motion vector, you can use subpixel
filtering, and subpixel filtering allows you to
basically pick a point in between two full pixels and
this point in between is then interpolated using a subpixel
interpolation filter.
In VP8, we had only a single subpixel interpolation filter.
Most codecs use just a single subpixel interpolation filter.
We've actually added three in VP9, and those are optimized
for different types of material.
We have a sharp subpixel interpolation filter, which is
really great for material where there's a very sharp
edge somewhere in the middle.
For example, that city clip that we were looking at in the
beginning, if you're thinking of a block that happens to be
somewhere on the border between the sky and a
building, we consider that a sharp edge, and so using an
optimized filter for sharp edges actually maintains a lot
of that detail.
On the other hand, sometimes there's very sharp edges but
those are not consistent across video frames across
different temporal points in the sequence that you're
looking at.
And that point, this will cause a very high frequency
residual artifact, and so for those, we've
added a low pass filter.
And what the low pass filter does is that it basically
removes sharp edges, and it does exactly the opposite as a
sharp filter.
Lastly, we have a regular filter, which is similar to
the one that VP8 had.
After this prediction step, you have predicted block
contents and you have the actual block that you're
trying to get as close as possible to, and then the
difference between these two is the residual signal that
you're going to encode.
So in current generation video codecs, we usually use four by
four or eight by eight cosine based transforms called DCTs
to encode this residual signal.
What we've added in VP9 is much higher resolution DCT
transforms all the way up to 32 by 32 pixels, and in
addition to using the DCT, we've also added an asymmetric
sine based transform called ADST.
And the sine based transform is optimized for a signal that
has a near zero value at the edge of the predicted region,
whereas the cosine is optimized for a residual
signal that has a zero signal in the middle of
the predicted signal.
So those are optimized for different conditions, and
together, they give good gains when used properly.
Basically, the take home message from all of this is
that we've added big resolution increments to our
video codecs, and what that leads to is a codec that is
highly, highly optimized for high definition video coding.
But at the same time, because it is very configurable, it
still performs really well at low resolution content, for
example, SIF-based 320 by 240 video as well.
So I'll hand it back to Matt now, who will take over.
MATT FROST: Thanks, Ronald.
So I just want to give you a quick recap of what we've
discussed and sort of the highlights of this technology,
and then to tell you about the last steps that we're going
through to get VP9 in your hands.
As Ronald said, we're talking about technology here that is
50% better than literally everything that everybody else
out there is using.
And actually, we made a point to say we were using the very
best encoder out there at the very best settings, settings
which I really think you're not seeing very often in the
real world because they're very difficult to use in a
real world encoding environment.
So I hope that there are a number of people in this
audience now who are out there, either with existing
products with video or products to which you're
looking to add video, or just you're thinking about how you
can use these tools to launch a new product and to come out
with a start-up.
This technology has not been used by anyone right now.
YouTube is testing it and we'll talk about that in a
little bit, but if you adopt VP9, as you can very quickly,
you will have a tremendous advantage over anybody out
there with their current offering based
on 264 or even VP8.
It's currently available in Chrome, and the libvpx library
on the WebM project is out there for you to download,
compile, and test.
It's open source.
You will have access to source code.
The terms of the open source license are incredibly liberal
so that you can take the code, improve it, optimize it,
modify it, integrate it with your proprietary technology,
and you're not going to have to give back a line of code to
the project.
You're not going to have to be concerned that you will
inadvertently open source your own proprietary code.
And then finally, it's royalty free.
And obviously, this is something that was of great
importance to us as we sought to open source a video
technology for use in HTML5 and the video tag.
We believe that the best is still to come in terms of
video products on the web, and that in order to make sure
that people are free to innovate and that start-ups
are free to launch great new video products, we have to
make sure that they're not writing $5 or $6 million
checks a year to standards bodies.
We're working very hard on putting this technology into
your hands as soon as possible.
We did a semi freeze of the bit stream just a couple of
weeks ago, and at that time, we said that we were taking
comments on the bit stream for 45 more days.
Specifically, we're looking for comments from a lot of our
hardware partners to some of the software techniques that
we're using just to make sure that we're not doing anything
that's incredibly difficult to implement in hardware.
At the end of the 45 day period on June 17, we're going
to be bit stream frozen, which means that after June 17, any
VP9 encoder that you use is going to be compliant with any
VP9 decoder, and that if you're encoding content with
an encoder that's out after June 17, it's going to be able
to play back in a decoder after the bit stream freeze.
Obviously, getting VP9 in Chrome is very
important to us.
The beta VP9 which you've been seeing today
is already in Chrome.
If you download the latest development version of Chrome
and enable the VP9 experiment, you'll be able to play back
VP9 content immediately.
As soon as we've frozen the bit stream as of June 17,
we're going to roll it into the Dev Channel of Chrome as
well with this final version of VP9, and then that's going
to work through the beta channel and
through the stable channel.
And by the end of the summer, we are going to have VP9 in
stable version of Chrome rolling out to the hundreds of
millions of users.
I think [INAUDIBLE]
today said that there are 750 million users of
Chrome right now.
VP9 is going to be deployed on a massive scale
by the end of summer.
In terms of final development activities that we're going to
be working on, after the bit stream is finalized in the
middle of June, we're going to be focusing on optimizations
both for performance and for platform.
So what that means is we'll be working on making sure that
they encoder is optimized for a production environment.
Obviously, something that's very important to YouTube as
YouTube moves to supporting VP9, that the decoder is
sufficiently fast to play back on many of the PCs
that are out there.
We're also going to be working on platform optimizations that
will be important to Android developers, for instance, and
to people who want to support VP9 on embedded devices.
These are ARM optimizations and
optimizations for other DSPs.
We have hardware designs coming out.
For those of you who may work with semiconductor companies
or are thinking about a technology like this for use
in something like an action camera, these are hardware
designs that get integrated into a larger design for a
semiconductor and allow for a fully accelerated VP9
experience.
Real time optimizations are obviously incredibly important
for video conferencing, Skype style applications, and also
for new applications that are coming out like screencasting
and screen sharing.
By the end of Q3, we should have real time optimizations
which allow for a very good real time performance.
Those optimizations should then allow VP9 to be
integrated into the WebRTC project, which is a sister
project to the WebM project and basically takes the entire
real time communication stack and builds it into Chrome, and
more broadly into HTML5 capable browsers.
And so what this means is that when VP9 is integrated into
WebRTC, you will have tools that are open source, free for
implementation that used to, even four years ago, require
license fees of hundreds of thousands of dollars.
And you, with a few hundred lines of JavaScript, should be
able to build the same sort of rich video conferencing style
applications and screencasting applications that you're
seeing with products like Hangouts.
And finally, in the end of this year moving into Q1 2014,
we're going to see, again, hardware
designs for the encoder.
So just to give you an idea of how usable these technologies
are, we have a VP9 demonstration in YouTube.
If you download the Development Version of Chrome
and flip the VP9 tag, you can play back YouTube VP9 videos.
And one thing this should drive home is this was a
project that was done over the course of two weeks, that VP9
was built into YouTube.
Obviously, we have very capable teams.
Obviously we have people on the WebM team and people on
the YouTube team who know a lot about these tools, but
this demonstration is VP9 in the YouTube operating
environment.
There's nothing canned here.
This is VP9 being encoded and transmitted in the same way
that any other video is.
So this, I hope, again, will give you guys pause to say,
god, we could do this as well.
We could come out very quickly with a VP9 based service that
will be remarkably better than anything that's
out there right now.
So I just want to leave you with some thoughts about what
I hope that you're thinking about coming away from this
presentation.
The WebM project is a true community-based open source
project, and obviously, these sorts of projects thrive on
contributions from the community.
We are coming out of a period where we've been very
intensively focused on algorithm development.
Some of this work is certainly very complicated stuff that
not every--
even incredibly seasoned--
software engineer can work on.
But we're moving into a point where we're focusing on
application development, we're focusing on optimization,
we're focusing on bug fixes and patches, and that's the
sort of thing that people in this room certainly can do.
So we encourage you to contribute and we encourage
you to advocate for use of these technologies.
We build open source technologies, and yet simply
because we build them, that doesn't mean that
people adopt them.
It takes work to get communities to focus on
adopting these sorts of open technologies.
So advocate within your project in your company,
advocate within your company for use of open technologies,
and advocate within the web community as a whole.
We think that with VP9, we've shown the power of a rapidly
developing, open technology, and we hope that people are as
excited about this as we are and that you go out and help
spread the word about this technology.
But most important, we'd like you to use the technology.
We're building this with a purpose, and that is for
people to go out, take advantage of these dramatic
steps forward that we've made with VP9.
And so we hope you will go out, that you'll be charged up
from this presentation, and that you'll immediately
download the Development Version of Chrome and start
playing around with this and start seeing what you can do
with this tool that we've been building for you.
So there are just a couple of other things I'd like to say.
There are a couple of other presentations
related to this project.
There's a presentation on Demystifying Video Encoding,
Encoding for WebM VP8--
and this is certainly relevant to VP9--
and then another on the WebRTC project.
And again, if you're considering a video
conferencing style application, screensharing,
remote desktopping, this is something that you should be
very interested in.
Sorry.
I shouldn't be using PowerPoint.
So with that, we can open it up to questions.
Can we switch to just the Developers Screen, guys?
Do I do that?
AUDIENCE: Hey there.
VP8, VP9 on mobile, do you have any plans releasing for
iOS and integrating with my iOS applications--
Native, Objective C, and stuff?
Do you have any plans for that?
MATT FROST: He's asking if VP8 is in iOS?
AUDIENCE: VP9 on iOS running on top of Objective C.
RONALD BULTJE: So I think as for Android, it's obvious
Android supports VP8 and Android will eventually
support VP9 as well.
For iOS--
MATT FROST: When I was talking about optimizations, platform
optimizations, talking about VP9, that's the sort of work
we're focusing on, ARM optimizations that should
apply across all of these ARM SOCs that are prevalent in
Android devices and iOS devices.
There aren't hardware accelerators and iOS
platforms right now.
Obviously, that's something we'd like to change, but
presently, if you're going to try to support VP8 in iOS,
you're going to have to do it through software.
AUDIENCE: Thank you.
RONALD BULTJE: Yep?
AUDIENCE: Bruce Lawson from Opera.
I've been advocating WebM for a couple of years.
One question.
I expect your answer is yes.
Is it your assumption that the agreement that you came to
with MPEG LA about VP8 equally applies to VP9?
MATT FROST: It does apply to VP9 in a slightly different
way than it does with VP8.
The agreement with MPEG LA and the 11 licensors with respect
to VP9 covers techniques that are common with VP8.
So obviously, we've added back some techniques we were using
in earlier versions, we've added in some new techniques,
so there are some techniques that aren't subject to the
license in VP9.
But yes, the core techniques which are used in VP8 are
covered by the MPEG LA license, and there will be a
VP9 license that will be available for developers and
manufacturers to take advantage of.
AUDIENCE: Super.
Follow up question.
About 18 months ago, the Chrome team announced they
were going to drop H.264 being bundled in the browser, and
that subsequently didn't happen.
Can you comment further on whether Chrome will drop H.264
and concentrate only on VP9?
MATT FROST: I can't really comment
on plans going forward.
What I can say is that having built H.264 in, it's very
difficult to remove a technology.
I think when you look at the difference between VP9 and
H.264, there's not going to be any
competition between the two.
So I think with respect to VP9, H.264 is slightly less
relevant because there was nothing--
we didn't have our finger on the scale for this
presentation.
And especially, we were hoping to drive home with that
initial demonstration which we put together over the last few
hours that we're not looking for the best videos.
We're just out there recording stuff.
So even if 264 remains in Chrome--
which I think is probably likely--
I don't think it's going to be relevant for a next gen codec
because there's just such a difference in quality.
AUDIENCE: Thanks for your answers.
AUDIENCE: Hi there.
I have a question about performance.
Besides the obvious difference in royalty and licensing and
all that, can you comment on VP9 versus HEVC, and do you
hope to achieve the same performance or proof of
[INAUDIBLE]?
RONALD BULTJE: So the question is in terms of quality, how do
VP9 and HEVC compare?
AUDIENCE: Yeah, and bit rate performance, yeah.
RONALD BULTJE: Right.
So testing HEVC is difficult.
I'll answer your question in a second.
Testing HEVC is difficult because there's currently no
either open source software or commercial software available
that can actually encode HEVC unless it's highly
developmental in nature or it is the development model.
The problem with the alpha and beta versions that are
currently on the market for commercial products is that
we're not allowed to use them in comparative settings like
we're doing.
Their license doesn't allow us to do that.
Then the problem with the reference model is it is a
really good encoder, it gives good quality, but it is so
enormously slow.
It can do about 10 frames an hour for a
high definition video.
That's just not something that we can really use in YouTube.
But yes, we've done those tests.
In terms of quality, they're currently about equal.
There's some videos where HEVC, the reference model, is
actually about 10%, 20% better.
There's also a couple of videos where VP9 is about 10%,
20% better.
If you take the average over, for example, all of those CCL
licensed YouTube clips that we looked at, it's about a 1%
difference.
I think that 1% is in favor of HEVC if you so wish, but 1% is
so small that really, we don't think that plays a role.
What does that mean going forward?
Well, we're really more interested in commercial
software that will be out there that actually encodes
HEVC at reasonable speed settings.
And like I said, there's currently nothing on the
market but we're really interested in such products,
so once they are on the market and we can use them, we
certainly will.
AUDIENCE: Follow-up question about the performance.
Is this any reason to not expect this to scale up to 4K
video or [INAUDIBLE]?
RONALD BULTJE: We think that the current high definition
trend is mostly going towards 720p and 1080p.
So if you look at YouTube uploads, there is basically no
4K material there, so it's just really hard to find
testing materials, and that's why we mostly use 720p and
1080p material.
MATT FROST: But certainly when we designed the codec, we
designed it with 4K in mind.
There aren't any limitations which are going to prevent it
from doing 4K.
RONALD BULTJE: Right.
You can use this all the way up to 16K video if that's what
you were asking.
MATT FROST: Sir?
AUDIENCE: Yeah.
Have you been talking to the WebRTC team, and do you know
when they're going to integrate VP9 into their
current products?
MATT FROST: We talk with the WebRTC team regularly.
As I said, we've got to finish our real time enhancements in
order to actually have a codec that works well in a real time
environment before we can expect it to be integrated
into WebRTC.
But I think we're looking at Q4 2013.
AUDIENCE: Great, thanks.
MATT FROST: We're in 2013, right?
RONALD BULTJE: Yeah.
AUDIENCE: Hi.
I just wanted to talk about the rate of
change in video codecs.
I think maybe we can see like VP8, VP9, we're talking about
an accelerating rate of change.
And that's great, and I really wanted to applaud the efforts
to getting this out in Chrome Dev quickly, or
Chrome Stable quickly.
I just wanted to ask about maybe some of your
relationships with other software vendors that are
going to be relevant, like we're talking Mozilla, IE, iOS
was, I think, previously mentioned.
As this kind of rate of innovation in codecs
increases, how are we going to make sure that we can have as
few transcode targets as possible?
My company is working on a video product.
We don't want to have eight different codecs.
And if we can imagine, let's say, that Version 10 comes out
relatively soon, sometime down the road.
How can we make sure that devices stick with a
relatively small subset of compatible decodings?
MATT FROST: I guess I'm a little unsure
of what you're asking.
In terms of how we get support on devices as quickly as
possible, or how we solve the transcoding problem?
AUDIENCE: And just keeping the number of transcoded formats
as small as possible.
If IE only supports H.264, I have to
have an H.264 encoding.
So I was just wondering what kind of relationships you guys
are working on to make sure that as many devices and
platforms as possible can support something like VP9.
MATT FROST: We're certainly working very hard on that, and
as I said in the slide on next steps showing the timeline,
our focus on having hardware designs out there as quickly
as possible is an effort to try to make sure that there's
hardware that supports VP9 more rapidly than hardware has
ever been out to support a new format.
We had a VP9 summit two weeks ago, which was largely
attended by semiconductor companies.
Actually, some other very encouraging companies were
there with great interest in these new technologies.
But we're working very hard with our hardware partners and
with OEMs to make sure that this is supported as quickly
as possible.
I think internally, what we're looking at is probably relying
on VP8 to the extent that we need hardware now and we don't
have it in VP9.
So I think what we've talked about is always falling back
to an earlier version of an open technology that has very
broad hardware support.
But we're trying to think very creatively about things like
transcoding and things that we can do to ensure backwards
compatibility or enhancement layers.
So part of the focus of this open development cycle and
process that we have is to really try to think in very
new ways about how we support new technologies while
maintaining the benefits of hardware support or device
support for older technologies.
AUDIENCE: Excellent.
Thank you.
AUDIENCE: So a key point in any solution is going to be
performance.
Hardware acceleration really solves that, and that was one
of the challenges with the adoption of VP8 in timing
versus H.264, which has broad spectrum hardware
acceleration.
I understand the timing, the delays, and the efforts you
guys are doing to really achieve that hardware
accelerated support for VP9.
But until then, what's the software performance in
comparison to H.264, for either both software,
software, or software, hardware?
RONALD BULTJE: So we've only done software, software
comparisons for that.
Let me start VP8 264.
Currently, VP8 decoding is about twice as fast as 264
decoding using fully optimized decoders.
VP9 is about twice as slow currently as VP8, decoding,
and that basically means that it's exactly at the same speed
as H.264 decoding.
That's not what we're targeting as a final product.
We haven't finished fully optimizing the decoder.
Eventually, what we hope to get is about a 40% slowdown
from VP8 decoding, and that will put it well ahead of the
fastest 264 decoders that are out there in software.
AUDIENCE: Great.
Thank you.
AUDIENCE: Hello.
I was just wanting to get some background on the comparison
between H.264 and VP9.
For H.264, what were you using--
CVR, BVR, and what QP values?
RONALD BULTJE: This is two path encoding at
the target bit rate.
So it's preset very slow.
Since we're doing visual comparison,
there is no tune set.
It's paths one or two, and then just a target bit rate.
We tend to choose target bit rates that are somewhere
between 100 and 1,000 kilobits a second, and then we just
pick the same point for the VP9 one as well to start with.
AUDIENCE: So in both of the comparisons, you were trying
to be very generic so you weren't tuning the encoder in
any way to make it a better quality at that bit rate.
You were just giving it two paths to try to figure it out.
RONALD BULTJE: So you mean visual quality, or--
AUDIENCE: Yes.
RONALD BULTJE: So we haven't tuned either one of them for
any specific setting.
For 264, the default is that it optimizes for visual
experience, and so that's why we optimized it to 6414.
So it's not optimized for SSIM or PSNR in the visual displace
that we did here.
VP9 encoding does not have any such tunes, so we're not
setting any item, of course.
AUDIENCE: So you just used the default settings of
[INAUDIBLE]?
RONALD BULTJE: We're using the default settings, and we've
actually discussed this extensively with the 264
developers.
They agree.
They support this kind of testing methodology, and as
far as I'm aware, they agree with it.
They fully expect the kind of results that
we're getting here.
AUDIENCE: Right.
OK, thanks.
AUDIENCE: Hi.
One more question about performance.
I think you mentioned a little bit about the real time.
So do you think in the future, you can manage to bring an
application like application desktop into the web?
I mean like putting three, four windows in the same
browser, high definition, things like that?
RONALD BULTJE: In terms of decoding or encoding?
AUDIENCE: Both.
RONALD BULTJE: So for encoding, yes.
So there will be real time settings for this codec
eventually.
For no codec will that get you exactly the types of bit rate
quality ratios that you're seeing here.
These are really using very slow settings, and that is by
far not real time.
But if you set the VP9 codec to real time settings, then
yes, eventually it will encode in real time.
It will be able to do four full desktops all at once, and
it will be able to decode all of those also.
You'll probably need a multicore machine for this,
obviously, but it will be able to do it, yes.
AUDIENCE: And you're using the graphics card and
other things like that.
You didn't mention about the hardware, OpenGL or--
RONALD BULTJE: It's future software.
There's no hardware involved.
AUDIENCE: No using the hardware, the card hardware.
RONALD BULTJE: We're not using GPU or anything like that at
this point.
AUDIENCE: Thank you.
AUDIENCE: Hi.
I just want to know, how does a VP9, now or later, compare
to VP8 and H.264 when we're talking about single task CBR,
low bitrate, real time encoding?
Little background is we are part of the screen sharing
utility that currently uses VP8, and we've been
successfully using it for a year, but the biggest gripe
with VP8 is that it doesn't respect bit rate, especially
on low bit rates, unless you enable frame dropping, which
is unacceptable.
So we have to do a bunch of hacks to actually produce
quality and it doesn't behave like H.264
would in that situation.
So how will VP9 address that problem, or is that even on
the roadmap?
RONALD BULTJE: So in general, desktop sharing and
applications like this, also real time communications, yes,
they're on the roadmap, and yes,
they will all be supported.
In terms of your specific problem, I guess the best
thing to do is why don't you come and see us afterwards in
the Chrome [INAUDIBLE], and we can actually look at that.
AUDIENCE: OK, awesome.
RONALD BULTJE: As for VP9, VP9 currently does not
have a one pass mode.
We've removed that to just speed up development, but it
will eventually be re-added, and it will be as fast as the
VP8 one but with a 50% reduction in bit rate.
AUDIENCE: Do you have a timeline for that?
Is it going to this year, or next year?
RONALD BULTJE: Like Matt said, that will happen--
MATT FROST: Late Q3.
RONALD BULTJE: Q3 2013, around then.
We're currently focusing on YouTube, and those kind of
things will come after that.
AUDIENCE: Awesome.
Thank you.
AUDIENCE: I have two questions, unrelated
questions to that.
What is the latency performance of VP8 compared to
VP9 in terms of decoding and encoding?
And the second question is, how does VP9 compare to H.265?
RONALD BULTJE: So I think H.265, I addressed earlier.
So do you want me to go into that further, or was that OK?
AUDIENCE: More in terms of the real time performance.
RONALD BULTJE: So in terms of real time performance, I think
for both, that's really, really hard to say because
there is no real time HEVC encoder and there is no real
time VP9 encoder.
So I can sort of guess, but this is something that the
future will have to tell us.
We will put a lot of effort into writing real time
encoders or adapting our encoder to be real time
capable because that is very important for us.
MATT FROST: But in terms of raw latency, it should be
faster than VP8.
You can decode the first frame, right?
RONALD BULTJE: I think it will be the same as VP8.
So VP8 allows one frame in, one frame out, and VP9 will
allow exactly that same frame control model.
AUDIENCE: So you mentioned that you've asked hardware
manufacturers for any concerns or comments.
Have you gotten any yet?
MATT FROST: Sorry.
Are considering supporting it?
AUDIENCE: Well, in terms of the algorithms and how you
would actually--
MATT FROST: They're working on it quickly.
AUDIENCE: But there's no concerns or comments or
anything yet?
MATT FROST: No concerns.
AUDIENCE: You said you opened up for comments.
MATT FROST: No.
We have received comments.
We have a hardware team internally that took a first
pass at comments.
We've received a couple of comments additionally just
saying, here's some stuff you're doing in software that
doesn't implement well, and hardware.
I don't foresee a lot of additional comments from the
hardware manufacturers.
The other work that we're doing over the next 45 days is
we had a bunch of experiments that we had to close out, and
so we're doing some closing out as well and just
finishing the code.
Absent act of God, this is bit stream final on June 17.
RONALD BULTJE: So we have actually received comments
from some hardware manufacturers, and we are
actively addressing the ones that we're getting.
AUDIENCE: OK, thanks.
AUDIENCE: Hi.
I might have missed this, but when did you say the ARM
optimizations for VP9 are going to come out?
MATT FROST: Actually starting now really, we're focusing on
doing some optimizations by ourselves and with partners.
So I would say that's going to be coming out second half of
the year, and it'll probably be sort of incremental where
you may get an initial pass of ARM optimizations and then
some final optimization.
It's obviously very important for us for Android to be able
to get VP9 working as well as possible, and obviously, ARM
is incredibly important for the Android ecosystem, so
that's an area of significant focus.
AUDIENCE: And in terms of real time encoding, so in order to
blend into WebRTC, you're going to
have to get that working.
So is this going to coincide with the assimilation of VP9
into WebRTC?
MATT FROST: It'll be real time optimizations, which I think
we were sort of thinking about end of Q3, beginning of Q4,
and then integration into WebRTC will follow on that.
Obviously, the one thing I'd say, it's
an open source project.
If you guys think that you see an opportunity, you can go out
and do the optimizations yourselves.
There are contractors who can do it.
So I encourage you guys to think about that, that you can
take the code and you can start working on some of this
stuff yourselves.
Obviously, we'd love it if you'd contribute it back but
we're not going to force you to.
Yeah, I guess last question.
AUDIENCE: This is a question about how VP9 relates to what
the Android team talked about with Google proxy and the
speedy proxy.
You alluded to transcoding real time for backwards
compatible device support.
Do you see Google doing the same thing they're going to do
with images in this proxy and doing video transcoding to
adapt this and use this for compression mode
in the Google proxy?
RONALD BULTJE: That's a really interesting application, and
that's something that we'll have to look into the future.
It's not as easy as it sounds because video transcoding
actually takes some time.
So that would mean that you would actually have to wait a
minute while the video is transcoding until you can
visit that website, and that might not be quite what you're
looking for.
But it's an interesting application and we might look
into that in the future.
MATT FROST: I think that's it.
I think we're out of time.
Sorry, but we're happy to talk to you afterwards.
[APPLAUSE]