字幕表 動画を再生する 英語字幕をプリント SUJITH RAVI: Hi, everyone. I'm Sujith. I lead a few machine learning teams in Google AI. We work a lot on-- how do you do deep networks and build machine learning systems that scale on the cloud with minimal supervision? We work on language understanding, computer vision, multi-modal applications. But also we do things on the edge. That means, how do you take all these algorithms and fit it to compute and memory-constrained devices on the edge? And I'm here today with my colleague. DA-CHENG JUAN: Hi, I'm Da-Cheng. And I'm working with Sujith on neural structured learning and all the related topics. SUJITH RAVI: So let's get started. You guys have the honor of being here for the last session of the last day. So kudos and bravo-- you're really dedicated. So let us begin. So we are very excited to talk to you about neural structured learning, which is a new framework in TensorFlow that allows you to train neural networks with structured signals. But first, let's go over some basics. If you are in this room and you know about deep learning and you care deeply about deep learning, you know how typical neural networks work. So if you want to take, for example, a neural network and train it to recognize images and distinguish between concepts like cats and dogs, what would you do? You feed images like this on the left side, which looks like a dog, and give the label "dog," and feed it to the network. And the process by which it works is you adjust weights in the network such that the network learns to distinguish and discriminate between different concepts and correctly tag the image and convert the pixels to a category. This is all great. All of you in this room probably have built a network. How many of you have actually built a neural network? Great! And this is the last session, last day-- but still, we're very happy that you're all with me here. So it's all great. We have a lot of fancy algorithms, very fancy networks. What is the one core ingredient that we need when we build a network? So almost majority of the applications that we work on, we require label data, annotated data. So it's not one image that you're feeding to this network. You're actually taking a bunch of images paired with their labels, cats and dogs in this case, but of course it could be whatever, depending on the application. But we feed it thousands or hundreds of thousands or even millions of examples into the network to train a good classifier, right? Today, we're going to introduce neural structured learning, which is a framework. We're happy to say it's support an in TensorFlow 2.0 and Keras. And it allows you to train better and more robust neural networks by leveraging structure in the data. So the core idea behind this framework is that we're going to take neural networks and feed it, in addition to feature inputs, structured signals. So think of the abstract image that I showed you earlier. Now in addition to these images paired with labels, you're going to feed it connections or relationships between the samples themselves. I will get to what these relationships might mean. But you have these structured signals and the labels. And you feed both into the network. You might ask, what do you mean by structure, right? Structure is everywhere. In the example that I showed you earlier, if you look at images-- just take a look at this graph here. The images that are connected via edges in this picture here basically represent that there's some visual similarity between these. So it is actually pretty easy to construct structured signals from day-to-day sources of data. So in the case of images here, it's visual similarity. But you could think of-- what if you tag your images and created albums that represent some specific concepts? So everything within an album or a photo album has some sort of a connection or interaction or relationship between them. So that represents another type of structure. It's not just for images. We can go to more advanced or completely different applications, like, if you want to take scientific publications or news articles and you want to tag them with their topic-- one simple thing. Take biomedical literature. All the papers that are published, whether it's Nature or any of the conferences, they have references and citations to other papers. That represents another type of structure or link. So these are the kind of structures that we're talking about here, relationships that are exhibited or modeled between different types of objects. In the natural language space, this occurs everywhere. If you're talking about doing Search, everybody has heard of Knowledge Graph, which is a rich source of information, which captures relationships between entities. So if I talk about the concept Paris and France, the relationship is one is the capital of the other. So these sort of relationships-- it's not typical to capture and feed them into a neural network. But these are the kind of relationships which are already existing in day-to-day data sources. So why not leverage them? So that is what we try to do with the neural structured learning. And the key advantages-- before we talk about what it does and how we do it, why do you even want to care about it? What is the benefit? So one of them is, as I mentioned earlier, it allows you to take this structure and use it to train neural networks with less labeled data. And that's the costly process, right? So every application that you want to train, if you had to collect a lot of rich annotated data at scale for millions of examples, it's going to be a tedious task. Instead, if you're able to use a framework, like neural structured learning, that automatically captures a data structure and relationship, with minimal supervision you're able to train classifiers or prediction systems with the same accuracy. That would be a huge boon. Who wouldn't want that, right? That's one type of a benefit. Another one is that, typically, when you deploy these systems in practice, in real-world applications, you want the systems or networks to be robust. That means, you train the ones you don't want-- if the input distribution changes or data suddenly changes or somebody corrupts the images with adversarial attacks-- suddenly the network to flip the predictions and go bonkers. So this is another benefit where, if you use neural structured learning, you can actually improve the quality of your network and also the robustness of the network. So let me dive a little deeper and give you a little more insight into the first scenario. Take document classification as an example. So I'll give you, probably, some example that you probably have at your home. Imagine you have a catalog or a library of books in your home. And these are digitized content. And you want to categorize them and neatly arranged them into specific topics or categories. Now one person might want to categorize them based on the genre. A different person might say, oh, I want it to belong to the same period. A third person might say, oh, I want to capture the books that have the same kind of content or phrases or words that appear by the same author or based on some certain aspect. Like, there is a particular plot twist that you have that is captured in different books. And I want to arrange them based on that. So you can think of-- not everybody's needs are the same. So are you going to collect and annotate enough label data for each of those tasks to create a network with very high accuracy, distinguish and classify them into these genres? Probably not. So on the one hand, you have plenty of data. The raw book content is available to you. Or raw news articles are available to you. There's plenty of raw text available to you. But it's hard to construct this labeled annotated data. So this is where Neural Structured Learning or NSL-- going back to the previous example, we can model the relationships between these different inputs or samples, using structure. Again, I will tell you what the structure means. But pair it with a few label examples and then train a network that is almost as good as the network that is trained on millions of examples. So imagine you could, for your application, only use 5% or 10% of the label data and train as good a classifier or prediction system. So that's what we're trying to do and help you do with neural structured learning. And I said, structure-- how do you come up with this structure? Or how do you do this for document classification? I'm going to give you a forward reference here that Da-Cheng is going to talk a little bit more about as well. So there is a hands-on tutorial for the exact example that it told you, document classification with a graph. And you can go to the Neural Structured Learning TensorFlow website and try this out for yourself, all within just a few lines of code. All you need to do is construct the data and the format. And with a few lines of code, you should be able to run the system end-to-end. Switching to the second scenario, it's great to have high-quality models. We have seen that neural networks have the capacity to train really, really good quality models, especially as you increase the number of parameters. But robustness is an important concept and, actually, an important criteria when we deploy this to real-world scenarios. For example, if you have an image recognition system and suddenly the network flips its prediction, because the images are corrupted, this is not a good thing. And this is not just for image classification. It could be with text or any kind of sensor data. Here is an example. Take the image on the left. What do you think it is? It's a panda. Take the image on the right. What do you think it is? Anyone who disagrees it's a panda? Basically, your neural network is saying that both of these images are pandas. But that's not what happens if you actually train a neural network, like, for example, resonance or whatever, the state of the [INAUDIBLE] network and you try to apply it to the two images. The first one would be correctly recognized as a panda. The second one would be recognized by a completely different concept, in this case, a gibbon. And the reason it happens is, if you zoom in really close, the second image is actually an adversarial example. It's actually created by adding some noise to the original image. But these are so tiny. It's very imperceivable for the human eye. But the network, based on these changes in the pixels, flips its prediction completely. Imagine this happening in a live system. So you don't want these things to happen. So you want your networks to be robust. So this is where NSL comes again. And again, we're going to use the same concept-- use the structure and the data to train more robust models. And here, the structure is slightly different from the one that I mentioned earlier. Earlier, we were talking about explicit relationships or model as a graph. Here, we are going to construct the structure. So we take the original image. And we also generate a perturbed image, or an adversarial example for that original image. Now these two images are joined via a link. So there is a relationship between them. The difference is, this is dynamically generated during the learning process, as opposed to the earlier case where somebody gives you a knowledge graph or these structured signals or you construct them from some data source. And what we try to do here is, using this structure, we already know the label for the first image is a panda. We're trying to force the network to learn that the perturbed image also should be classified as a panda. So that's, at a very high level, how this works. Again, if you want to try this out for any of your network or in any application, there's a tutorial which allows you to do this-- like, use the API. So NSL enables both the neural graph type of learning and also the adversarial learning. And you can just go to the website and run through the code example, all with just a few lines of code. You will see more details later in the talk. So this is at a high level why we would want a framework like NSL and the power of using it to enable more robust networks and also build networks that can be trained with minimal supervision. These are very, very handy when you want to build applications on the fly and very, very custom applications that do not fit the regular mode. Let us now dive a little deeper into how we do this-- what the framework is doing. So in the first paradigm, as I said, structure is going to be used to model and as an input and [AUDIO OUT] network. So we call this paradigm Neural Graph Learning. So the core idea is that, in addition to these feature inputs that you're familiar with, like pixels for image classification or word features or phrase features or sentence features for document classification-- in addition to those feature inputs, you're going to pass in structured signals modeled as a graph. You might ask, by the way, at this point, where's the graph coming from in this settings, right? In some cases, it might be given to you, as I said, like the citation graph, knowledge graph. In other cases, you can actually construct a graph. We're very happy to say that we provide tools-- again, you will hear more about this-- that allow you to construct these graphs from sources of data like word embeddings or image embeddings. So now the goal here in neural graph learning is-- the network is going to be forced to jointly optimize both the feature input and the structured signal simultaneously. Let's see how that happens, diving in deeper. If you're trying to look at what exactly the network is learning, every network is trying to optimize some loss. So in image classification, what is the loss when you take the pixels, pass through the network, get some prediction-- what is the error incurred between the predictions and the true label? So in NSL, the Neural Graph Learning setting-- we call these networks trained in this mode Neural Graph Machines. What we're trying to optimize is two components. One is the standard loss, which in image classification is the loss incurred when you pass the pixels through the network, get the predictions, and measure the error with the true labels. The other component is going to be based on the structured signal or the graph that you provided. And where that comes in is-- if I have an image that looked like the pit bull dog, that's labeled as a pit bull. If I have a different image, which through my structured signal has an edge with the original image, then the network is forced to learn that the source image and its neighbor in the graph should learn similar representations. That means, you're trying to say, respect the structure that you provide as input and, also, try to optimize the supervised loss. Now this is very flexible, as you can imagine. You can use the API to change the formulations. That means, instead of a supervised loss, if you want to do unsupervised learning or a very different kind of loss, you can actually change the first component very easily. And just another quick note-- we don't have time for that. But we're happy to answer more questions at the end. The losses themselves, the type of losses, are also customizable. You can use L2 loss, cross entropy, depending on the kind of applications. You can even use ranking losses, if you will. So this now makes it very, very easy for you to train a wide range of applications in a different learning setting, whether it's unsupervised, supervised, or ranking, or classification. But at the same time, you'll be able to pass in some structure in a seamless manner. Here is an example. So the NSL neutral graph learning-- take image classification. You start with some samples, as I said-- the pixels. And you also have a structure. In this case, the images are connected in the graph, based on some user interaction signal or, basically, for example, as I showed you, they belong to the same album. Or there's some structure tying them together. Assuming this is given to you, we pass this through the network. Both the sample and its neighbors are passed simultaneously through the same network. And the network is learning to optimize within each layer-- and this is also configurable, by the way-- to push the embedding for neighbors closer to each other. That means two images that are connected in the graph should learn similar embeddings when passed through the network. simultaneously, you should also optimize that they should learn the correct predictions. So if one of them was labeled as a panda, then you also want the prediction error to be minimal. So both of these parts are being optimized jointly. OK-- so hopefully this gives you an idea of how we use neural graph learning and enable this-- we had the neural structured learning framework. As I mentioned, structure can come in different forms. That was an explicit structure we provided as a graph input. But we can also do implicit structures. And this is where the adversarial learning type of paradigms are enabled, using the NSL framework. And here again, we're going to jointly optimize features and structure. Except the difference is the structure is now induced during the learning process, by constructing adversarial examples to the original input. So if you nxi as an input, you create nxi-prime, which is an adversarial version of that. And these two are connected under some sort of weight, based on-- this is configurable. And this structure is now passed through the network. And the network is forced to optimize both of them to the same embeddings or representations inside. These are all great. So as I mentioned, this opens up a host of new kind of application or training scenarios. The best part about this is, if you're thinking, oh, now how does this work with transformers or resonants or different kinds of network-- network structure doesn't matter here. You can use this with any type of network structure. That's the best part-- RNNs, transformers, resonance convolutions. You have combination of CNNs, LSTMs. It doesn't matter. These are learning strategies. You can actually build a network but enable NSL, both in the adversarial and in the neural graph setting very easily with very few lines of code in TF 2.0. And to tell you more about that, I'm handing it over to Da-Cheng. DA-CHENG JUAN: All right, thank you, Sujith. Next, we are going to introduce the libraries, tools, and trainers provided by the structural learning framework. Everything here is compatible with TensorFlow 2.0. So you could train the neural nets with structured signals while enjoying all the great features from TensorFlow 2.0. This is the training workflow we just mentioned previously. Every segment in red here is a new step introduced to the workflow to train with structured signals. And neural structured learning provides libraries and tools for these steps. Let's first take a look at the left part of the workflow. The training samples and neighbors from the same neighborhood are packed to form the new batch. Notice that, in the batch, each training sample is extended to include the neighborhood information. To achieve this in a neural structured learning framework, we provide standalone tools, such as build_graph and pack_nbrs that a user could involve directly. We also provide functions that users could integrate into their own custom pipeline. And you may notice, build_graph and pack_nbrs here are listed both as binaries and functions. This is not a typo. This means they could be invoked both as a binary or as a function. Next, let's take a look at the right part of our figure. Again, we provide libraries for these new steps, introduced to enable graph regularization. Both the training sample and its neighbor will be fed to the neural network. And unpack_neighbor_features is for this purpose. The model in this illustration is convolutional neural net. But it can be any type of neural network, not just limited to the convolutional neural net. Then the difference between the sample and its neighbor embedding is calculated and added to the final loss as the regularization term. In addition, we also provide libraries to generate the adversarial neighbors, as in place of structure signals for regularization. Finally, we also provide Keras APIs for a user to easily build Keras trainers with graph_regularization or adversarial_regularization. The Keras API from neural structured learning supports all three types of model building, either via sequential, via functional API, or via subclassing. This is just a subset of tools and libraries we provided in the neural structured learning framework. Please visit our website to learn more about the tools and APIs in neural structured learning. The first step, if you want to use neural structured learning, is to do a pip install. Here, we provide a code example, demonstrating the API from the Neural Structured Learning library. We first need to read the training data. Note that the data here are pre-processed by the tools or functions to incorporate the graphs into training samples. Next the user could build custom models and treat it as the base model. The user could build this model, the base model-- use any of their favorite Keras API, like we just mentioned-- sequential, functional, or subclassing. After the base module is built, we use the API to wrap around the base module to enable the graph_regularization. There are several hyper parameters we need to configurate. For example, we need to specify the maximum number of neighbors considered during our regularization. Also, for each hyper parameter, we provide default values set to a certain number that we know, empirically, they work well. After we enable graph_regularization in Keras model, the rest is just a standard Keras workflow-- compile, fit, and then eval. That's it. Within five lines, we are able to enable graph_regularization. And five lines actually include one line that's a common, not the actual logic implementation. Here, let us show some results of a model trained with structured signals. The task is to conduct the sentiment analysis on the IMDB movie reviews. We want to point out that this result is just from one of our internal experiments. Your actual mileage may vary from task to task, from data to data, or from model to model. The x-axis here represents the amount of supervision, which could be converted into the amount of label example. And y-axis here represents the model accuracy. The left figure shows the performance of a bi-directional LSTM. And the right figure shows the performance of a feed forward neural net. As you can see, when we have lots of training examples, when the amount of supervision is high, there is actually not much performance difference. But as soon as the amount of supervision dropped to 5% or even 1%, training with structured signals leads to more accurate models. Usually, the improvement is more than 10%. If you are interested in more results, please, refer to our paper. So training with structured signals sounds really great. But sometimes, we do not have a structure. We do not have a graph to begin with. So what should we do? Neural structured learning provides two methods. The first one is to construct the graph or to construct the structure via data pre-processing. And the second one is to construct such structure via adversarial neighbors. Let's focus on the data pre-processing one first. Again, let's take document classification as an example. Given a sample document, how do we know if another document is similar enough to be a neighbor document? These documents will be projected to the embedding space. For example, we could use the pre-trained [? Bird ?] embedding that's mentioned in the earlier TensorFlow talk, and to project all these documents into generator embedding. Documents that are closer in the embedding space are assumed to have similar semantics. Next, we examine the similarity between two embeddings, [INAUDIBLE] similarity, or other metric could be used here. If the similarity is higher than a predefined threshold, we treat these two documents are similar enough. And therefore, we add an edge between these two documents to make them neighbors. By repeating this process, we could construct a structure or construct a graph among all the data via the data pre-processing. After we have the graph, the rest of the training flow is exactly the same as we mentioned before. The one asks if the graph is given. Let's again take a look at the actual code example. We first load the training data and test samples from the IMDB data set. Next, we load the pre-trained embedding model from the TF Hub. The embedding model we use here is a swivel model. But feel free to replace that with your favorite pre-trained embedding model, such as [? Bird. ?] Next, we project the text or the document of each review from IMDB to the embedding, so we could calculate a similarity between two reviews in the embedding space. Remember, when two reviews are closer in the embedding space, we assume they share similar semantics. After we project text to the embedding, we use Build Graph function provided by the neural structured learning to construct the graph. When invoking this function, we also need to provide a similarity threshold, which is 0.8 in this case. After we have the graph, we call pack_nbr function to incorporate the neighbor samples into each training sample. Here, for each sample, three neighbors are considered. After we augment the training data with graph signals, everything is just like the first code example we show. Read data. Build a base module via either sequential, functional, API, or subclassing-- feels familiar. Then we use the neural structured learning API to wrap around the base module to enable the graph regularization. Again, the rest of the workflow is just a standard Keras flow-- compile, fit, and eval. So we also provide a hands-on step-by-step tutorial on our website. So feel free to visit the website and play it by yourself. The second method to construct a structure or construct a graph signal is to build a graph dynamically by adding adversarial neighbors. For each training sample, we find a malicious perturbation based on the reverse gradient direction. In other words, that perturbation is designed to confuse the model by most, which means to maximize the loss. Then this malicious perturbation is added to the original training sample to create an adversarial neighbor. Again, the design of adversarial neighbor targets at confusing the model by most, which is to maximize the loss. Then we add an edge between this adversarial neighbor and the original training example. And therefore, we have constructed a graph or constructed a structure. This is the code example, using adversarial Keras model from neural structured learning. Again-- feel familiar? In addition to these three lines, everything else followed the same workflow introduced before. Neural structured learning has been widely used in many products and services in Google, for example, learning image semantic embedding. Here, we provide six examples, two for each semantic granularity, to illustrate a difference from coarser to ultra-fine granularity. The object in the right is the Golden Gate Bridge. The Golden Gate Bridge is a steel, red bridge. But not all the steel red bridge, such as the image in the middle, are the Golden Gate Bridge. Learning such embedding is a challenging task, partly due to the large variations seen among images that belong to the same class or to the same category. Learning image embedding to capture fine-grained semantics, however, is the core of many image-related applications, such as image search, either query by traditional keywords or the example query image. This is the overall neural architecture used to learn the image embedding. And again-- feel familiar? This is exactly the same workflow we introduced again and again in this talk. And since this talk focuses on neural structured learning, we will not introduce in detail about other techniques, such as Sample Softmax used to train a model. If you are interested, please refer to our paper. Let's zoom in into the structure part. The graph used here is a co-occurrence graph. Essentially, the co-occurrence graph is trying to answer the following question. Given one image is selected, what are other images that's sufficiently similar that will also be selected? Say, the query is the white English bulldog. If two images occur for many, many times, we add an edge between these two images, making them neighbors. So here are some experimental results. For each query image, we provide a top-three nearest neighbors, based on the image embedding learned. The image colored by green are rated to be strongly similar with the query image by human raters. Whereas the image colored by red is not so similar. For example, in the left figure, when the image query is a white scroll, all the white scrolls can be correctly retrieved by using a embedding learned from the neural structured learning framework. In other words, learning with structure is able to capture image semantics much closer to actual human perception. So to recap-- training with structure is very useful. Less labeled data are required to effective train a model. Also, learning with structured signals leads to the more robust model. And neural structured learning provides APIs, provides tools in Keras model. And also, it works for all types of neural net, either free forward neural net, convolutional neural net, or recurrent neural net, or any custom neural net you designed. This is probably the most informative slide of this talk. You can learn tools, libraries, or hands-on tutorials in detail in our website. Also, please star our GitHub. We do take GitHub issues. And we would love to hear from you. We are looking forward to developing this framework with all of you, to make this framework more comprehensive. We will be waiting for your pull request on GitHub. Thank you. [APPLAUSE] Do you have time for questions? OK, so I think we still have some time for questions. SPEAKER 1: You may have to use the mic. Like-- AUDIENCE: The build_graph function that you mentioned, does it do pair wise comparison across all the items or-- DA-CHENG JUAN: All the samples-- not only training-- yeah. AUDIENCE: I see. So for-- DA-CHENG JUAN: Both labeled and un-labeled samples. AUDIENCE: I see. So for IMDB-- it does it-- the user does it-- how long did it take to build a graph? [INAUDIBLE]-- DA-CHENG JUAN: So it heavily depends on what machines you are using, right? So since IMDB review is a relatively small data set, building graph will not take too long. AUDIENCE: I see. SUJITH RAVI: One addition to that-- so it turns out, you don't really have to do all pairs comparison. There are much faster techniques. AUDIENCE: Yeah. SUJITH RAVI: So stay tuned. I think we are in the process of releasing other tools which will make it much faster, even on single machines, without having to do-- if you have a million examples, you don't need to do a million cross million, right? AUDIENCE: Sure-- sure. We actually tried to use that. And it took a lot of time. So instead of that, then we used phase [INAUDIBLE].. SUJITH RAVI: Great, so you're probably going to be the first user for something that we're going to release soon then. AUDIENCE: Yeah-- OK. Thank you. AUDIENCE: Very nice. Just so I think I can understand what you're doing-- it looks like you're taking images and then GAN-generated images from the generator-- associating them and, thereby, negating the ability of GANs to deceive a neural network. Is that correct? SUJITH RAVI: It does not necessarily have to be the GANs structure. The idea is, any network that you're trying to learn, with using the gradients that are back dropped and reversing the gradients to construct a noisy example, if you will. And the idea-- the reason to do this-- and then while doing the training-- is that, next time the network sees this noisy example, it will still learn to correctly identify it, rather than flipping the predictions. AUDIENCE: And then-- SUJITH RAVI: And so we're trying to make the intermediate layers and also the predictions robust. AUDIENCE: Sure. In that loss function, it looked like it had something from a discriminator plus a generator or something-- that notation. Or maybe it was just a coincidence. SUJITH RAVI: The-- which-- DA-CHENG JUAN: Which equation? AUDIENCE: That loss function that you had that you were trying to minimize. SUJITH RAVI: Oh, so the loss function just has two components. The first one is the supervised loss or whatever loss your application has. The second one is that we're factorizing the loss over neighbors, like source images and neighbors. And in the case of adversarial, that neighbor image is basically constructed. And the weight is basically-- AUDIENCE: Yes. SUJITH RAVI: --whatever weight that you're assigning to that in the field image. AUDIENCE: Very good. Thank you. AUDIENCE: Very neat idea-- thanks. But I think on page 34, the graph looked a bit weird to me. Can you open the-- DA-CHENG JUAN: Page 34? AUDIENCE: 34-- the one you compared with the standard. DA-CHENG JUAN: This AUDIENCE: Yeah, this one. Yeah-- this graph looked a bit strange to me. How many samples of each point you generated for the error version here? DA-CHENG JUAN: So basically, the error bar is from the test data set. It's not from-- AUDIENCE: No, no, no. DA-CHENG JUAN: --the training-- AUDIENCE: Why I'm on-- how many trials-- SUJITH RAVI: --training trials did-- AUDIENCE: Yeah, how many times you start from a different seed and-- DA-CHENG JUAN: Oh, we trained-- AUDIENCE: --trained the networks and then get these results? DA-CHENG JUAN: Oh, how many training trials? I do believe we have five trials per-- AUDIENCE: Each point is five samples. DA-CHENG JUAN: Yeah. AUDIENCE: Average up to five samples-- OK. SUJITH RAVI: Yeah, technically, like is said, it depends on the network that you're training. Typically, what you would expect is, even if the network is really powerful, like the gap to increase-- so there's some-- in the 2% thing, you will see, the gap is lower than what the 5% is. But that's just based on this data set. So typically what you see is the gap increases as the supervision ration goes lower. AUDIENCE: Yeah, this is what I would expect. But if you look at it for 2% and then 1/2 a percent-- SUJITH RAVI: Yeah, yeah-- so-- AUDIENCE: --they're coinciding [INAUDIBLE]---- SUJITH RAVI: Yeah, so this is just one example. We would recommend that you try it on your own data set, like, on one of the networks that you build. AUDIENCE: Hi, how you doing? I was curious how you guys might apply this to, say, segmentation or to video classification. SUJITH RAVI: I think there are a couple of different ways. Like in the video classification, you can look at videos that are sort of related, like, let's say, from the same channel, for example. And you have similar kind of content. Or the metadata kind of matches-- you can actually now create these links between these different videos. And assuming that you have a way of processing through [INAUDIBLE] that puts all these frames into some representation, you're going to apply this regularization in the NSL framework to say that, hey, these related videos all should be optimized to learn the same prediction. There are many other ways. We can talk offline about that. AUDIENCE: OK-- thank you. AUDIENCE: If we're talking about-- sorry. SUJITH RAVI: Last question-- yeah. DA-CHENG JUAN: Yeah-- last question. AUDIENCE: Yeah-- I have two actually. Sorry. So if we're talking about data augmentation, do you compare the classical principles, like distortion and blurring, stretching, and all this kind of stuff, compared to the adversarial-generated samples? Is there any benefit of using one versus another in this case? SUJITH RAVI: So-- AUDIENCE: Like, why adversarial-generated samples versus, let's say, some classical principles? SUJITH RAVI: Because the distortion and rotation are predefined transformation functions that we have, right? The adversarial-- it depends on the data and the network. And you're using the gradients that are back-dropped to create an equal example, the hard example, if you will. Like for transformations, everybody knows, hey, this is the rotation. You apply rotation or you're blurring. But what if somebody attacks and says, it's not a rotated image. But instead, I take pixels here, pixels there, and I transform them or blur them. That's a very different kind of transformation function. So here, you're trying to do this across the board, using the learning process and the gradients that are being learned for the network at that layer. AUDIENCE: Mm-hm. SUJITH RAVI: Does that make sense? AUDIENCE: And the second question-- when the network sees the feedback from the graph that, actually, adversarial neighbor gives you a completely low score on the example but the network itself is pretty sure that the class is there-- so how does it result on this level, when you have a pretty convinced scenario versus the adversarial neighbor? SUJITH RAVI: I think the clarification maybe referring to the gibbon-- DA-CHENG JUAN: Sure. SUJITH RAVI: --mentioned. So just to clarify, the adversarial example does not necessarily have that wrong label during the learning process. We're constructing it on the fly. And we're making the network force it to-- we just target as, hey, that is a gibbon, if you didn't apply the adversarial learning. AUDIENCE: Yeah, but you have the feedback from the graph that says that the classification is wrong here. So you try to regularize it. And by this moment, the network is already pretty much convinced that it's not a gibbon, for example. But the graph tells that, still, the class is wrong. So how does this result? DA-CHENG JUAN: So a quick add on this is, actually, in some experiments, we applied a correct label to the adversary example. In a previous slide, actually, the adversarial example-- we will have the [INAUDIBLE] as labeled as a gibbon. AUDIENCE: All right-- OK. OK. DA-CHENG JUAN: So you could think it this way-- AUDIENCE: Yes, yes. DA-CHENG JUAN: This sample is very confusing to the model. But we still supervise model to learn the correct label. And therefore, later on-- AUDIENCE: All right. DA-CHENG JUAN: --the neural network will be more robust. AUDIENCE: OK-- clear. Thank you very much.
B1 中級 TensorFlowにおける神経構造化学習 (TF World '19) (Neural structured learning in TensorFlow (TF World '19)) 1 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語