字幕表 動画を再生する 英語字幕をプリント [MUSIC PLAYING] SARA ROBINSON: Hi, everyone. Thank you for coming. Today we're going to talk about machine learning APIs by example. I'm going to teach you how you can access pre-trained machine learning models with a single API call. My name is Sara Robinson. I'm a developer advocate on the Google Cloud Platform team, which basically means I get to help build demos, give talks about them, and bring product feedback back to the engineering teams. You can find me on Twitter, @SRobTweets. And I live in New York. So before we get started, let's talk about what machine learning is at a high level. So at a high level, machine learning is teaching computers to recognize patterns in the same way that our brains do. So it's really easy for a child to recognize the difference between a picture of a cat and a dog, but it's much more difficult to teach computers to do the same thing, right? So we could write rules to look for specific things, but we can almost always find a condition that's going to break those rules. So instead, what we want to do is write code that finds these rules for us and improves over time through examples and experience. Here we have a neural network that's identifying a picture as either a picture of a cat or a dog. And we can think of the input to this network as pixels in the image. And then each neuron is looking for a specific identifying feature. Maybe it's the shape of the ear or the hair. And then the output is a prediction-- in this case, that is a dog. Let's take a step back from this for a moment and let's try some human-powered image detection, if we were to do this on our own. We'll take this picture of an apple and an orange. And let's say we were going to start writing an algorithm that would identify the difference between these two. What are some features that you might look for you? You can shout it out. AUDIENCE: Color. SARA ROBINSON: Color. I heard a bunch of "color." AUDIENCE: Shape. SARA ROBINSON: Shape. AUDIENCE: Texture. SARA ROBINSON: Texture-- lots of good ones. So color's a good one, but then what would happen if we had black and white images? Then we might have to start all over again. So in that case, we could for a stem. Texture would be good. But then what happens if we add a third fruit? If we add a mango, we have to start all over again as well. But these pictures are all pretty similar, right? So what would happen if we had pictures of two things that were very different? This should be really easy, right? A dog and a mop have pretty much nothing in common from these pictures that I can see. But it's actually a little tricky. So what we have here is pictures of sheep dogs and mops. And it's actually kind of hard to tell the difference, right? If we were going to write code that identified these two, it would be pretty tricky to do. And then what happens if we have photos of everything? We don't want to write specific rules to identify each little thing that we're trying to look for. And in addition to photos, we could have many other types of unstructured data. We could have video, audio, text, lots of different types of data we'd be dealing with. And really, we want some tools to help us make sense of all of this unstructured data. And Google Cloud Platform has several products to help you benefit from machine learning. On the left-hand side here, you can use your own data to build and train your own machine learning model. So we have TensorFlow and Cloud Machine Learning Engine for you to do that. And on the right-hand side, this is where I'm going to focus today. This is what I like to call friendly machine learning. These are machine learning APIs that give you access to a pre-trained machine learning model with one single REST API request. So you make a request. You send it some data. You get some data back from this pre-trained model. And you don't have to worry about building and training your own model or anything that's going on under the hood. I'm going to give you an introduction to each of these five APIs. I'm going to start with Vision, and I'm going to end with Video Intelligence, which is our newest API you may have seen in the keynote yesterday. So let's get started with Vision. The Vision API lets you do complex image detection with a simple REST request. And I'm going to start each section by talking about a customer or customers that are using each API. So for the Vision API, the first example on the left is Disney used the Vision API for a game to promote the movie "Pete's Dragon." And the way the game worked is that users were given a quest. So they had a clue, and they had to take a picture of that word-- maybe couch, computer, everyday objects. And if they took the picture correctly, they would superimpose an image of the dragon on that object. So the problem was they needed a way to verify that the user took a picture of the correct object that they were prompted for. And the Vision API was a perfect fit to do that. They used the label detection feature, which basically tells you, what is this a picture of? And they were able to verify images in the game that way. Realtor.com uses the Vision API for their mobile application. So its a real estate listing service, and people can go around as they are looking for houses and take pictures of a "for sale" sign. And they use the Vision API's OCR, Optical Character Recognition, to read the text in the image and then pull up the relevant listing for that house. So those are two examples of the Vision API in production. Let's talk a little bit more about the specific features of the Vision API. So as I mentioned, we have label detection, which is kind of the core feature which tells-- you send it an image. In this case, it's a cheetah. And it'll tell you what this is a picture of. It'll give you a bunch of different labels back. Face detection will identify faces in an image. It'll tell you where those faces are in the image. And it'll even tell you if they're happy, sad, surprised, or angry. OCR is what I mentioned with realtor.com's use case. This can identify text in an image. It will tell you where the text is, what the text says, and what language it's in. Explicit content detection will tell you, is this image appropriate or not-- really useful if you've got a site with a lot of user-generated content and you don't want to manually filter images. You can use this API method to do that really easily. Landmark detection will tell you, is this a common landmark? If so, what is the latitude and longitude? And then logo detection, pretty self-explanatory, will identify logos in an image. So a quick look at some of the JSON response you might get back for these different features. This is face detection. This is actually a selfie that I took with two teammates on a trip to Jordan last year. And the response you're looking at in the slide is on my face. So it'll return an object for each face you find in the image. And we can see it says, headwear likelihood, very unlikely, which is true. I'm not wearing a hat. But for both of my teammates, it did return headwear likelihood, very likely. And then we can see it's highlighted below. It says, joy likelihood is very likely, which is true. I am smiling in the picture. The next feature I want to show you the response for is landmark detection. So we have a picture here of what looks like the Eiffel Tower. It's actually the Paris Hotel and Casino in Las Vegas. I wanted to see if the Vision API was fooled. And it was not. It correctly identified this as the Paris Hotel and Casino. You can see that MID in the JSON response. That's an ID that maps to Google's Knowledge Graph API, which will just give you a little more data about the entity. And it also tells us the latitude and longitude of where this Paris Hotel and Casino is. In addition to these features, we launched some new features this week, which you may have heard about in yesterday's keynote. I'm going to quickly talk about what they are. And then I'll show you some examples. The first one is crop hints, which will give you suggested crop dimensions for your photos. Web annotations-- I'm super excited about this one. This will give you some granular data on entities, web entities that are found in your image. It'll also tell you all the other pages where the image exists on the internet. So if you need to do copyright detection, it'll give you the URL of the image and the URL of the page where the image is. And then finally, we announced document text annotations. So in addition to the OCR we had before, this is improved OCR for large blocks of text. If you have an image of a receipt or something with a lot of text, it'll give you very granular data on the paragraphs and the words and the symbols in that text. Some examples of the new features, I want to highlight web annotations. So here we have a picture of a car. It's in a museum. I'm actually a big "Harry Potter" fan, so this is a car from one of the "Harry Potter" movies. And I wanted to see what the Vision API was able to find from the web entities in this image. So it was able to identify it correctly as a Ford Anglia, which is correct. This is from the second "Harry Potter" movie when they tried to fly a car to school. Second entity it returned is art science museum. And this is a museum in Singapore where this car is on display. And it is finally able to tell me that this car is from "Harry Potter," which is a literary series. So lots of great metadata you can get back from web annotations. Even more data that it returns-- it tells you the full matching image URLs. Where else does this image exists on the internet? Partial matching images. And then finally, all the pages that point to this image. It's really useful information you get back with web annotations. And you can try this in the browser with your own images before writing any code. Just go to cloud.google.com/vision. You can upload your images, play around, see all the responses you get back from the different features. And you can actually do this with all of the APIs that I'm going to talk about today. There's a way to try them in the browser. And in case you were wondering how the Vision API did with the sheep dogs and mops, so this is the response it got from that picture on the right. So it's 99% sure it's a dog. It actually even is able to identify the breed of the dog-- Komondor. I may be saying that wrong. And the mops, it successfully identified this as a broom or a tool. And skipping ahead, it did pretty well overall. In the top row, the third one, it didn't identify it as a dog. It just said "fur." So I don't know if that's a hit or a miss. And then the third mop, it said "textile." So it didn't quite get that it was a mop or a broom. But overall, Vision API performed pretty well on these tricky images that are even hard for us to decipher what they are exactly. So that was the Vision API showing you how you can get a lot of data on your images with a pre-trained machine learning model. Next I want to talk about audio data. And the Speech API essentially exposes the functionality of "OK, Google" to developers. It lets you do speech-to-text transcription in over 80 languages. One app that's using the Speech API is called Azar. And they have connected 15 million matches, an app to find friends and chat. And they use the Speech API for all of the messages that involve audio snippets. And they're also using this in combination with the Cloud Translation API, which I'm going to talk about later on. So there's a lot of potential use cases where you could combine different machine learning APIs together. So in cases where the matches don't speak the same language, they'll use the Speech API to transcribe the audio and then the Translation API to translate that text. The best way to experience a Speech API is with a demo. Before I get into it, I want to explain a bit how it works. So we're going to make a recording. I wrote a bash script. We're going to use SoX to do that. It's a command line utility for audio. So what we'll do is we'll record our audio. We'll create an API request in a JSON file. And we'll send it to the Speech API. And then we'll see the JSON response. So if we could go ahead and switch to the demo-- OK, let me make the font a little bigger. So I'm going to call my file with bash request.sh. And it press "Press Enter when you're ready to record." It's going to ask me to record a five-second audio file. So here we go. I built a Cloud Speech API demo using SoX. OK, so this is the JSON request file that it just created. We need to tell the Speech API the encoding type. In this case, we're using FLAC encoding, the sample rate in hertz. The language code is optional. If you leave it out, it will default to English. Otherwise, you need to tell it what language your audio is in. And then the speech context, I'm going to talk about that in a little bit. So I'm going to call the Speech API. It's making a curl request. And let's see how it did. OK, so it did pretty good. It said, "I built a Cloud Speech API demo using socks." But you'll notice it got the wrong "SoX" because "SoX" is a proper noun. It was 89% confident that it got this correct. That even was able to get "API" as an acronym. So I mentioned this speech context parameter before. And what this actually lets you do is let's say you have a proper noun or a word that you're expecting your application that's unique that you wouldn't expect the API to recognize normally. You can actually pass it as a parameter, and it'll look out for that word. So I'm going to hop on over to Sublime, and I'm going to add "SoX" as a phrase so look out for. And let's see if it's able to identify it. I'm going to say the same thing again. I'm going to record. I built a Cloud Speech API demo using SoX. And we can see it's now got that phrase in there. And we will call the Speech API. And it was able to get it correctly using the phrases parameter, which is pretty cool. Just one REST API request, and we are easily transcribing an audio file, even with a unique entity. And you can also pass the API audio files in over 80 different languages. You just need to tell it, again, the language code that you'd like it to transcribe. So that is the Speech API. You can hop back to the slides. So we've just transcribed our audio. We have text. So what happens if you want to do more analysis on that text? And that is where the Natural Language API comes into play. It lets you extract entities, sentiment, and syntax from text. A company that's using the Natural Language API is called Wootric. And they are a customer feedback platform to help businesses improve their customer service. And they do this by collecting millions of survey responses each week. A little more detail on how it works is if you look at that box in the top right-- so a customer would place this on different pages throughout their app. Maybe if you're a developer, you would see it on a documentation page. And it would ask you to rate your experience on that page from 0 to 10, which is what we call the Net Promoter Score, NPS. So they gather that score, and then they give you some open feedback to expand on what you thought of the experience. So as you can imagine, it's pretty easy for them to average out the Net Promoter Score among tons of different responses. But what's much more difficult is looking at that open-ended feedback. And that's where they use the Natural Language API. And they actually made use of all three methods in the Natural Language API. They used intimate analysis to calibrate the numbered score that users gave with their feedback to see if it aligned. And then they use the entity and syntax annotation to figure out what the subject was of the feedback and then route it accordingly. So maybe somebody was unhappy about pricing. Then they could route that feedback to the necessary person and respond pretty fast. So using the Natural Language API, they were able to route and respond to feedback in near real-time rather than having someone need to read each response, classify it, and then route it. So let's look at each of the methods of the Natural Language API in a bit more detail. As I mentioned, I'm a big "Harry Potter" fan. So I took this sentence from JK Rowling's Wikipedia page. And let's see what happens if we send this to the entity extraction endpoint. So it's able to pull these five entities from the sentence. And the JSON we get back looks like this. So what's interesting here is JK Rowling's name is written in three different ways. Robert Galbraith is actually a pen name she used for a later book series. And it's able to point all of these to the same entity. So if you had things like "San Francisco" and "SF," it would point those to the same entities so that you could count the different ways of mentioning the same thing as the same entity. So we can see it finds her name, Joanne "Jo" Rowling. It tells you what type of entity it is-- a person. And then if the entity has metadata, it'll give you more metadata about it. So here we get an MID, which maps to JK Rowling's Knowledge Graph entry. And then we get the Wikipedia URL to the page about her. The JSON response look similar for the other entities we found-- British. It maps it to a location. And then notice it connects it to the United Kingdom Wikipedia URL. So if it had instead said "UK" or "United Kingdom," it would point it to the same page. And then for "Harry Potter," we also get person. And we get the Wikipedia page for that entity as well. So that's entity extraction. That's one method you could use in the Natural Language API. The next one is sentiment analysis. So this is a review you might see. It says, "The food was excellent, I would definitely go back." And we get two things here. We get a score value, which will tell us, is the sentiment positive or negative? It's a value ranging from negative 1 to 1. So we can see here it's almost completely positive. And then magnitude will tell you how strong is the sentiment regardless of being positive or negative. This can range from 0 to infinity. And it's based on the length of the text. So since this is a pretty short block of text, the value is pretty low. And then finally, you can analyze syntax. So this method is a bit more complex. It gets into the linguistic details of a piece of text. It returns a bunch of different data here. This visualization is actually created from the in-browser demo. So if you want to try it out in the browser, you can create a similar visualization with your own text. And what it returns is on that top row, those green arrows, that's what we call a dependency parse tree. And that will tell us how each of the words in a sentence relate to each other, which words they depend on. In the second row, we see the orange row. That's the parse label, which tells us the role of each word in the sentence. So we can see that "helps"-- the sentence is "The natural language API helps us understand text." "Helps" is the root verb. "Us" is the nominal subject. We can see the role of all the other words in the sentence as well. That third row where we only have one word is the lemma. We can see here it says "help." And what that is is the canonical form of the word. So the canonical form of "helps" is "help." So this way, if you're trying to count how many times a specific word occurs, it won't count "helps" and "help" as two different words. It'll consolidate them into one word. And then in red, we have the part of speech, whether it's a noun, verb, adjective, or punctuation. And then in blue, we have some additional morphology details on the word. There's more of these returned for Spanish and Japanese, which are the other two languages that the API currently supports. So the syntax annotation feature might be a little harder to grasp when you might use this in an application. So I wanted to show you a demo specifically focused on that feature. And for this demo, over the course of the past few days-- oh, I think the-- there we go. Mic is back. I've been using the Twitter Streaming API to stream tweet about Google Next. So I've streamed tweets with the hashtag #googlenext17 and a couple other search terms that are put in node server that's running on Compute Engine. And I'm streaming those tweets. So the Streaming API gathers just a subset of those tweets, not all the tweets with that hashtag. And I'm sending the text of the tweet to the Natural Language API. And then I'm storing the response in BigQuery. BigQuery's our big data analytics warehouse tool. It let's you do analytics on really large data sets. So I'm storing it in BigQuery. And then from there, I can gather some data on the parts of speech in a sentence. So I can find, for example, the most common adjectives that people are using to tweet about Google Next. So if we could go to the demo-- cool. So this is the BigQuery web UI which lets you run queries directly in the browser. So here, I just did a limit 10 to show you what the table looks like. I think I've got about 6,000 to 7,000 tweets in here so far. So here I'm collecting the ID of the tweet, the text, the created at, how many followers the user has, what hashtags it finds that's returned from the Twitter API. And then I've got this giant JSON sring of the response from the Natural Language API. So you're probably wondering, how am I going to write a SQL query to parse that? Well, BigQuery has a feature called user-defined functions, which lets you write custom JavaScript functions that you can run on rows in your table. So over here, I've got a query that's going to run on every tweet in this table. And my JavaScript function is right here. And what it's going to do is it's going to count all of the adjectives and then return that in my output table. So if I run this query here, it's running this custom JavaScript function on all the tweets in my table, which I think is about 6,000 right now. It ran pretty fast, and I'm not using cached results. So let's take a look. We've got 405 uses of the word "more," "new," "great," "good," "late," "awesome." You can see some more here. So that's one example of a use case for the syntax annotation feature of the Natural Language API. You can go back to the slides. So the Natural Language API lets you do analysis on text. One other thing that you might want to do with text is translate it. You likely have users of your application all over the world. It'd be useful to translate it into their host language. And the Translation API exposes the functionality of Google Translate to developers and lets you translate text in 100-plus languages. And for a second, let's talk about Google Translate. I'm a big fan of Google Translate. Has anyone here used it before? It looks like a lot of people. I use it when I travel all the time. So a couple of months ago, I was on a trip to Japan. I was at a restaurant where nobody spoke English, and I really wanted to order octopus. So I typed it into Google translate. It turns out the word for that is "tako," which confused me a little bit. I didn't to order an octopus taco, although maybe that would be good. So I just showed the person at the restaurant my Google translate app, successfully got my octopus. That's a picture of it right there. But likely, you want to do more than translate the word for octopus. And that's why we have the Translation API, which lets you translate text in your application in many different languages. And Airbnb is an example of a company that's using the Translation API. And what you might not know is that 60% of Airbnb bookings connects people that are using the app in different languages because people use Airbnb a lot, especially when they travel internationally. And so for all of those connections, they're using to translate API to translate not only listings, but also reviews and conversations into the person's host's language. And they found that this significantly improves a guest likelihood to book if it's translated into their host language. And that's one example of someone using the Translation API. It's pretty self-explanatory, but I wanted to show you a code snippet of how easy it is to call the API. This is some Python code that's making a request to the Translation API. And you can see here we just create a translate client. And then we pass it the phrase we'd like to translate, the target language. It'll detect the original language for us. And then we can print the result. And one thing we've added to the Translation API recently is neural machine translation. And this greatly improves the underlying translation model. And basically, the way it works is with first-generation translation, which is what we had before, it was translating each word in a sentence separately. So let's say you had a dictionary and you were looking up word for word each sentence in the dictionary and translating it without understanding the context around that word, that works pretty well. But what neural machine translation does is it actually looks at the surrounding words, and it's able to understand the context of the word in the sentence. And it's able to produce much higher quality translations. There's a great "New York Times" article with a lot more details on how that model works. You can find it at that Bitly link there. If anyone wants to take a picture, I'll leave it up for a second. And just to show you some of the improvements that neural machine translation brings-- this is a lot of text. I know. But what I did is I took a paragraph from the Spanish version of "Harry Potter." So this is the original text that the Spanish translator wrote. And then I showed you in first-generation translation in the middle how it'll translate that to English, and then in neural machine translation all the way on the right-hand side. And I bolded the improvements. So we can look at a few of them. The first bold word is-- it says, "which made drills." It's describing the company where he works. And then neural machine translation is able to change that word to "manufactured," which is much more specific to the context of the sentence. Another example is if we look at where it's describing Mrs. Dursley's neck, the first generation says "almost twice longer than usual." In the second version, it says "almost twice as long as usual," which is a slight improvement. And then if we look at the bottom, it goes from "fence of the garden" to "garden fence." And then in the last example, the first generation used the pronoun "their." And then neural machine translation is able to identify the correct pronoun "her" more specifically. So just a quick example highlighting some improvements that neural machine translation brings to the Translation API. And on its own, the Translation API is pretty self-explanatory, but I wanted to show you a small demo of a Python script I wrote of how to combine different APIs together. So in this demo, it'll take three types of text input. It could take either raw text, audio, or an image of text. And then we'll pass it through the Natural Language API. And then finally, we'll translate it into a few languages. And then we'll translate it back to English so that you can see the result. So if we could switch back to the demo-- it looks like it's up here. So I'm going to run the script-- python textify.py. And it's going to tell me we're going to send some text to the Natural Language API. It supports English, Spanish, and Japanese. And I have three options. I can either type my text, record a file, or send a photo of text. So I'm going to type some text. I'm going to say "We are using the Translation API. It is awesome." And we got a bunch of data back here. So this is what the JSON response looks like just for one token. I didn't want to print the whole JSON blob here. This is just for the token "we." This is all the data it returned. So it tells us it's a pronoun. And a lot of these part of speech morphology data is going to be unknown for English, but it relates to other languages that the API supports. But it is able to tell us that it is a plural token and it is in first person. And it is the nominal subject of the sentence. And we ran some sentiment analysis on it. It says you seem very happy. It was an excited sentence. And it tells us the entities it found. So it found Translation API. And the way the entity analysis endpoint works is it's able to identify entities even if they don't have a Wikipedia URL. So Translation API doesn't, but it's still able to pull this out as an entity. So if we were, for example, building an app to maybe route customer feedback, we could say, OK, this feedback is asking about the Translation API. And then we could route it to the appropriate person. So now we're going to translate this text. And let's translate it into Japanese. There we go. So this is the version translated into Japanese. I'm guessing most of us don't speak Japanese, so I've translated it back to English. And you can see that it did a pretty good job. So I'm going to run the script once more. And [AUDIO OUT] use an image. So if we look over here, I've got just a generic resume image. We're going to pass it to the API. So I'll clear the screen and run the text again. We're going to send a photo this time. And it is resume.jpg. Sending it to the Vision API. And the Vision API found, if we scroll up, all this text in the image. Using the new document text extraction, it was able to pull essentially all the text from that resume. There's an example of what a token returned. And it found all these different entities. And now we can translate it. Let's translate it to German. It's a lot of text there; I know. But this is the resume translated into German, and then again back to English. So just an example of how you can combine multiple machine learning APIs to do some cool text analysis. And you can go back to the slides now. So that was the Translation API. And the last thing I want to talk about is the Video Intelligence API. How many of you saw it at the keynote yesterday? Looks like most of you. So the Video Intelligence API lets you understand your videos entities at a shot, frame, or video level. So video level entities, it will tell you, at a high level, what is this video about? And then it'll give you more granular data on what is happening in each scene of the video. A company that's using this is Cantemo. They're a media asset management company. So a company or a user that has a lot of videos would upload their videos to Cantemo, and Cantemo would help them better understand those videos, search their library, transcode videos. And this is a quote from the VP of product development at Cantemo. He says, "Thanks to the Google Cloud Video Intelligence API, we've been able to very quickly process and understand the content of video down to the individual frame with an impressively rich taxonomy." So they're using the Video API to help their customers better search their own video libraries. And I'm going to show you the same demo that you saw in the keynote, but we'll look at a different video and go into a bit more detail. So if we could switch back to the demo, here's the API. And I've got a different video than I showed in the keynote earlier. This is a video that just shows you a tour of the Google Paris office and a little bit of the neighborhood around it. And I'll play the first bit of it. It starts up by just showing some frames. And then we'll get into a tour of the neighborhood around the office. And then we go inside. It interviews some employees. I won't play the whole thing. But we can look at some of the labels it returned. So it's able to identify this amusement ride, amusement park, from the beginning. We know there's a bunch of very short frames in the beginning of that video. It's able to see that it's a statue. If we look at the fruit annotation, it identifies a basket of fruit in this scene. We can scroll down and look at a couple more labels-- landscaping and cuisine. We see people getting some food. And school-- here it thinks it's a school inside there. So you can see we're able to get pretty granular data on what's happening in each scene of the video. And another thing the Video Intelligence API lets us do is search a large video library. So if we're a media publisher, we've got petabytes of video data sitting in storage buckets. It's otherwise pretty hard to search a large library of video content. You'd have to manually watch the videos looking for a particular clip if you want to create, say, like a highlight reel of a specific content within your library. Intelligence API makes this pretty easy, because as you can see, all the data we get back on this video, we can get that back on all the videos in our library, which makes it pretty easy to just search for a specific entity within our library. So as I showed you in the keynote, one example is let's say-- actually, first, let me show you the library. So we've got a bunch of videos here, as you can see. And let's say we'd like to search for all our baseball videos. We can see what we get back here. And it shows us this video is almost entirely about baseball. This one has fewer baseball clips. And we can point to all of them specifically. And then in this one, we see that moment from the-- not playing. Let me try refreshing the page. There we go. We see that moment from the Year in Search video from last year when the Cubs won the World Series. I'm from Chicago, so I was pretty excited about that. One more search that I showed you before is we can find all of the beach clips in our videos. So here, it's easy to, if we wanted to create a highlight reel, if we were really missing the beach, see all the beach clips in our videos. It'd be super easy to do this using the Video Intelligence API. Now, since most of you saw this demo in the keynote, I wanted to talk a little bit more about how I built it. So you can go back to slides. It was built by me and Alex Wolfe. If you like the UI, you should give him a shoutout on Twitter, @alexwolfe. He would appreciate it. So we worked on this together. And this is an architecture diagram of how it works. So the video API processing is being done on the back end. And the way it works is you pass the Video Intelligence API a Google Cloud Storage URL of your video. And then it'll run the analysis on that video. So I have a Cloud Storage bucket where I'm storing all my videos. And I've got a cloud function listening on that bucket. And that cloud function will be triggered anytime a new file is added to the bucket. It will check if it's a video. If it is, it'll send it to the Video Intelligence API for processing. And one cool thing about the API is you can pass it the output URL of the URL of a file you'd like it to write the JSON response to in Google Cloud Storage. So I've got a separate cloud storage bucket where I'm storing all the video annotation JSON responses. And the API automatically writes all of my video annotations to that bucket. So the front end of my application doesn't have to call the video API directly. It's already got all that metadata in two separate cloud storage buckets. And the front end of the application is a Node.js application built on Google App Engine. That's a little bit about how the demo works. And this is more granularly what the JSON response looks like from the Video Intelligence API. So this is a video of a tour of the White House. And at this particular point in time, it identifies the label "Bird's-eye view." And it's able to tell us the start time and end time in microseconds of where that label appears in the video. And it also returns a confidence score ranging from 0 to 1, which will tell us how confident is the API that it successfully identified this as bird's-eye view. In this case, it is 96%. So it is pretty confident that this is a bird's-eye view. And then one more snippet of this video, a portrait of George Washington-- and it's able to successfully identify that this is a portrait. It tells us the start time and end time of where that is in the video, along with a confidence score-- 83%. So just an example of what the JSON looks like that you get back from the Video Intelligence API. That wraps up my tour of the APIs. And if you all want to start using them today, you can go to the try-it pages. So for each of the API product pages, as I showed you with Vision, there's an in-browser demo where you can try out all the APIs directly in the browser before writing any code to see if it's right for you and your application. So I definitely recommend checking that out. I'll let you guys take a picture of that page before I switch. OK, it looks like almost everyone has got it. Some other talks that I recommend that are related to machine learning-- BigQuery and Cloud Machine Learning Engine was a talk that was yesterday. All the videos from yesterday have already been posted. So if there's any talks that you wanted to see that you missed, you can go watch them on YouTube. Another talk, Introduction to Video Intelligence-- so if you want to see a deep-dive on the Video Intelligence API, [INAUDIBLE], who's the product manager on that, is going to be giving that talk today at 4:00 PM. Highly recommend checking that talk out. And then if you're more interested in the side of building and training your own machine learning model, there's a session tomorrow at 11:20 on the lifecycle of a machine learning model. I definitely recommend checking out all three of these sessions. And if you can't make it, you can always watch the videos on YouTube after the session. So thank you. That's all I've got. [APPLAUSE]
B1 中級 米 事例で見る機械学習API(Google Cloud Next '17 (Machine Learning APIs by Example (Google Cloud Next '17)) 154 22 alex に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語