Placeholder Image

字幕表 動画を再生する

  • If there’s one deep net that has completely dominated the machine vision space in recent

  • years, it’s certainly the convolutional neural net, or CNN. These nets are so influential

  • that theyve made Deep Learning one of the hottest topics in AI today. But they can be

  • tricky to understand, so let’s take a closer look and see how they work.

  • CNNs were pioneered by Yann Lecun of New York University, who also serves as the director

  • of Facebook's AI group. It is currently believed that Facebook uses a CNN for its facial recognition

  • software.

  • A convolutional net has been the go to solution for machine vision projects in the last few

  • years. Early in 2015, after a series of breakthroughs by Microsoft, Google, and Baidu, a machine

  • was able to beat a human at an object recognition challenge for the first time in the history

  • of AI.

  • It’s hard to mention a CNN without touching on the ImageNet challenge. ImageNet is a project

  • that was inspired by the growing need for high-quality data in the image processing

  • space. Every year, the top Deep Learning teams in the world compete with each other to create

  • the best possible object recognition software. Going back to 2012 when Geoff Hinton’s team

  • took first place in the challenge, every single winner has used a convolutional net as their

  • model. This isn’t surprising, since the error rate of image detection tasks has dropped

  • significantly with CNNs, as seen in this image.

  • Have you ever struggled while trying to learn about CNNs? If so, please comment and share

  • your experiences.

  • Well keep our discussion of CNNs high level, but if youre inclined to learn about the

  • math, be sure to check out Andrej Karpathy’s amazing CS231n course notes on these nets.

  • There are many component layers to a CNN, and we will explain them one at a time. Let’s

  • start with an analogy that will help describe the first component, which is theconvolutional

  • layer

  • Imagine that we have a wall, which will represent a digital image. Also imagine that we have

  • a series of flashlights shining at the wall, creating a group of overlapping circles. The

  • purpose of these flashlights is to seek out a certain pattern in the image, like an edge

  • or a color contrast for example. Each flashlight looks for the exact same pattern as all the

  • others, but they all search in a different section of the image, defined by the fixed

  • region created by the circle of light. When combined together, the flashlights form what’s

  • a called a filter. A filter is able to determine if the given pattern occurs in the image,

  • and in what regions. What you see in this example is an 8x6 grid of lights, which is

  • all considered to be one filter.

  • Now let’s take a look from the top. In practice, flashlights from multiple different filters

  • will all be shining at the same spots in parallel, simultaneously detecting a wide array of patterns.

  • In this example, we have four filters all shining at the wall, all looking for a different

  • pattern. So this particular convolutional layer is an 8x6x4, 3-dimensionsal grid of

  • these flashlights.

  • Now let’s connect the dots of our explanation: - Why is it called a convolutional net? The

  • net uses the technical operation of convolution to search for a particular pattern. While

  • the exact definition of convolution is beyond the scope of this video, to keep things simple,

  • just think of it as the process of filtering through the image for a specific pattern.

  • Although one important note is that the weights and biases of this layer affect how this operation

  • is performed: tweaking these numbers impacts the effectiveness of the filtering process.

  • - Each flashlight represents a neuron in the CNN. Typically, neurons in a layer activate

  • or fire. On the other hand, in the convolutional layer, neurons perform thisconvolution

  • operation. We're going to draw a box around one set of flashlights to make things look

  • a bit more organized.

  • - Unlike the nets we've seen thus far where every neuron in a layer is connected to every

  • neuron in the adjacent layers, a CNN has the flashlight structure. Each neuron is only

  • connected to the input neurons it "shines" upon.

  • The neurons in a given filter share the same weight and bias parameters. This means that,

  • anywhere on the filter, a given neuron is connected to the same number of input neurons

  • and has the same weights and biases. This is what allows the filter to look for the

  • same pattern in different sections of the image. By arranging these neurons in the same

  • structure as the flashlight grid, we ensure that the entire image is scanned.

  • The next two layers that follow are RELU and pooling, both of which help to build up the

  • simple patterns discovered by the convolutional layer. Each node in the convolutional layer

  • is connected to a node that fires like in other nets. The activation used is called

  • RELU, or rectified linear unit. CNNs are trained using backpropagation, so the vanishing gradient

  • is once again a potential issue. For reasons that depend on the mathematical definition

  • of RELU, the gradient is held more or less constant at every layer of the net. So the

  • RELU activation allows the net to be properly trained, without harmful slowdowns in the

  • crucial early layers.

  • The pooling layer is used for dimensionality reduction. CNNs tile multiple instances of

  • convolutional layers and RELU layers together in a sequence, in order to build more and

  • more complex patterns. The problem with this is that the number of possible patterns becomes

  • exceedingly large. By introducing pooling layers, we ensure that the net focuses on

  • only the most relevant patterns discovered by convolution and RELU. This helps limit

  • both the memory and processing requirements for running a CNN.

  • Together, these three layers can discover a host of complex patterns, but the net will

  • have no understanding of what these patterns mean. So a fully connected layer is attached

  • to the end of the net in order to equip the net with the ability to classify data samples.

  • Let’s recap the major components of a CNN. A typical deep CNN has three sets of layers

  • – a convolutional layer, RELU, and pooling layersall of which are repeated several

  • times. These layers are followed by a few fully connected layers in order to support

  • classification. Since CNNs are such deep nets, they most likely need to be trained using

  • server resources with GPUs.

  • Despite the power of CNNs, these nets have one drawback. Since they are a supervised

  • learning method, they require a large set of labelled data for training, which can be

  • challenging to obtain in a real-world application. In the next video, well shift our attention

  • to another important deep learning modelthe Recurrent Net.

If there’s one deep net that has completely dominated the machine vision space in recent

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

畳み込みネット-第8話(ディープラーニングSIMPLIFIED (Convolutional Nets - Ep. 8 (Deep Learning SIMPLIFIED))

  • 120 13
    alex に公開 2021 年 01 月 14 日
動画の中の単語