Placeholder Image

字幕表 動画を再生する

  • no si r is optical character recognition, and the key point about OCR is I think about it as a compression.

  • You're basically taking an image in two dimensions with some depth, and you're going through a number of stages of compression to extract only the information out of here.

  • That's of interest to the person looking at it.

  • And so people talk about this is being basically turning it into the founder of the text that's there, and that is important.

  • That's what I would call metadata about it.

  • But the first and foremost thing you're doing is compressing information that's out there into information that could be used either by another machine or by a human.

  • I've used this largely in system types of engineering where, for example, I worked with Jerome Better Clay, who's from a P F ail in Switzerland.

  • He wanted to be able to take American signs and turned them into French so that he didn't have to worry about what they said in English.

  • And that gets complicated in the states because there's brown signs which you're talking about historic places or museums.

  • There's green signs which are giving you information.

  • There's blue signs which are giving you other types of information.

  • And then, of course, there's red and yellow signs which are very, you know, partner, and you better stop.

  • You better yield.

  • You better do this and that and it can be daunting if you're coming from another country.

  • And so it was a really good OCR application where you were taking some type of an image, converting it into the tax said it was associated with it and then for him functionally translating that into another language.

  • And I think that's very important for people to recognize.

  • OCR is not just okay, I've got a document, I'm scanning it and it converts it into the text form.

  • People don't actually do that very much anymore.

  • People have a supercomputer in their pocket called a mobile phone, and they want to be able to use that for OCR.

  • So we're going to start off with is the very first stage of compression in the most important one.

  • Because if you don't do the job right in the first stage, the rest of it is toast.

  • Okay?

  • And so I got to work with one of the world's preeminent experts on OCR over the course of about five years when he was at HP Labs Ray Smith, who's currently with Google.

  • He has taken the HP Labs OCR work, which he did, and open sourced it.

  • It's called Tesseract, and I recommend it to people when they start with OCR because it's free.

  • You're not putting a lot of cost into it other than your own time, and it does work pretty well.

  • And he and Ray has been sort of guiding that overtime to enable you to be able to use OCR for other languages.

  • And he's also been a good proponent of what I call med algorithm mix, where you're taking multiple OCR engines and using them to create better.

  • So let's start off with threshold ing, and I'm gonna do a little drawing here.

  • A za graph you gotta label here.

  • And so this is going to go from 0 to 2 55 and basically what this is is looking at the density that we have for an image and so to 55 is going to be pure white.

  • Unless you're on some Apple systems, which will be the opposite, will be black, but zero is gonna be full black.

  • And so in between, at about 1 28 you'll have half gray, which will be like this.

  • And so what you'll get is a hist a gram, which is basically a graph of how many of each of these types of black and white occur.

  • And most of these grafts are gonna look something like this in an ideal state.

  • And so when I get a graph like this, I need to be able to do what's called Finalization and bynder.

  • Ization means that I'm going to turn this from a rich panoply of values from 0 to 2 55 to just zero in a one.

  • And so you can probably see from this graph.

  • This will be my new one.

  • This will be my new zero.

  • And so there's a number of methods that actually do bynder ization, the most famous of which are probably the out Sue and then the Hitler at all method.

  • I think that's from 1986 outs.

  • Who might be from the same timeframe.

  • Okay, so what I've talked about here is a simplification.

  • What I've done is giving you a global bynder ization method.

  • There are also local methods that will handle, For example, when you've got Blur when you've got a little you know, Joyner's et cetera, look at those briefly and kind of show you the impact that will have on the text you get.

  • This is difficult to kind of show by drawing, but I'll do my best.

  • Suppose you've got, for example, the letter t here and around it is some noise that you've captured in the image.

  • And so this is an image of a letter T.

  • If I do some type of a global threshold, what I'll get out of this is something that looks like this, and so that's starting to look like a T.

  • But if I do a better job with this, if I do some local filtering, so this is a global threshold and this is a local financial, that'll taking that into accounts, I may be able to do some trimming so that the T looks more like this, and that's actually what I want out of this.

  • The character that I've got now again is buying Arise where I'm showing the blue here of the black here.

  • I've actually got information, all of this stuff is the background which has been threshold it out.

  • And so now this is a single what we call a connected component character and the connected component character is the next stage for OCR.

  • And so what I'll do now is start to represent things as if I've drawn an outline around these.

  • And so this is the outline around that connected component, and those would all use those air basically shapes or objects that I've collected from the connected component.

  • And so let's take a look at that and we'll talk about the next stage of OCR, which is where we're actually forming the characters.

  • So this is what I have now and remember, there's no metadata associated with this.

  • This is just a image.

  • It happens to be a compressed image.

  • If the original was, for example, in red green blue color, it had 24 bits.

  • This is one bit, and so I've already done the compression by a factor of up to 24 from what I started with Now I have to figure out what that is.

  • Now.

  • You and I can look at this and in the context of a Latin language, will know that that's a T.

  • We don't know that it's English yet we don't know that it's Italian.

  • We also don't know that it's a Latin language.

  • So if it was a Cyrillic language of it was an Arabic language et cetera, we might have to look for other things that give us cues.

  • So the next thing we actually do is collectively look at a bunch of characters and let's say we've got the word, they're in English.

  • We look at those and we look for characteristics about those that allow us to basically assign the language.

  • And so work that was done in this area was led by Larry Spits.

  • About 15 years ago.

  • Larry Spitz was able to identify what language it was off of.

  • Just the character set that you got from these connected components.

  • Let's consider that done science or other people working on this Ray Smith himself has worked on this other people in the various optical character recognition Zorro see our vendors, which include Abby, which include nuance, which include a wide variety of other folks over the years they will be able to ascertain based on the character, said a fairly large character said.

  • Usually what languages is with a lot of confidence, there's also a default.

  • If you don't know what it is and you don't have a lot, you start off thinking it's English, right?

  • So that's kind of a common language that you start with or you think of the local language.

  • If you're tied into, for example, GPS and we know you're in Forenza, we're going to assume that it's Italian until proven otherwise.

  • So there's a number of ways of doing that.

  • I don't have enough time to go into all of those details.

  • But the bottom line is I now have a character set.

  • I now have a pretty good idea of what language it is Now.

  • I have to actually do the downstream matching of those characters to what they are, and then also potentially finding out what font it is so that I can reproduce this in the correct font for the final.

  • You know, whatever.

  • I'm going to be using this for now.

  • I'm just doing sign translation.

  • I'm converting this from, let's say, Italian and English and English into Italian.

  • I don't really care what font it is.

  • It's gonna be the default fund on the display of the device.

  • I'm reading this off, but we'll just go there So again, very wide set of applications that come off for this.

  • I apologize for going into so many will try to keep it simple.

  • So from this, the next thing we need to do is actually classifications.

  • And what we're doing there is we're classifying by the alphabet.

  • And so if I know that it's English, I'm going to try to do anything from something as in elegance as pattern matching.

  • Pattern matching is brutal because if I don't have the right font, it might not be a good match.

  • So if I've got this little teacher, the ideal t that I have might be this.

  • If I know which font it is, I might have a much better tea that will match against this one much better.

  • So a lot of trade offs here when I actually do the classification and when I actually do the font identification and most modern systems, there is a meta structure around that that allows me to speculate on which character said it might be, and then also on which font it might be within those characters, so a large case, it might not just declassification by alphabet.

  • It might also be either simultaneously or sequentially or recursive.

  • Lee Classification buys the fonts, and so the key thing there is that I'm going to be doing font matching, and you want to move this into classifications because there you're able to bank on a lot of other good work that's been done outside of the field of OCR.

  • So OCR, like most of the fields we talk about here, is a specialized field.

  • And there are people like Larry Spits and, like Ray Smith, who have done a lot of work in that space and know how to directly apply classifications techniques to that.

  • The classifier is they use, though, will vary from what Ray used way back when were hidden.

  • Markov models very good models, also for natural language processing speech recognition, those types of fields where you have a limited alphabet and you're trying to do matching for that.

  • More modern technologies now would use S PM's ADA boost boosting tech technologies for classification, and now increasingly deep learning in deep learning is actually being applied to this field as well.

  • Nothing magical about deep learning other than the fact that we're able to, because of the architecture's of modern processing equipment, were able to add another layer to what we do in artificial neural networks, and that gives us a lot more plasticity.

  • It's kind of like what a physicist does when they're talking about string theory.

  • If we actually live in an 18 or 22 dimensional universe, we can probably fit before we see onto their in a number of ways and prove our point.

  • And so the same kind of thing we've got with deep learning.

  • If we've got this surfeit of possibilities for moving between input and output, that is from this image to which character it actually is in the alphabet.

  • There's a number of ways of mapping those We count on deep learning to kind of train that, and it's all about the training set.

  • And I think that's an important part, and that's what I'm going to jump to next is actually talking about the training set that we've got and how we use that to be able to assess the classification part of OCR.

  • So So a quick recap.

  • We've got a vine arised image from that image, we form connected components.

  • Those connected components Wilmore less correspond to the characters that we've got within the alphabet.

  • In some cases, you look at Arabic, you look at some of the subcontinent or Indian languages.

  • They're going to have Joyner's that air.

  • With that, we'll leave that aside that complexity aside right now and just say we've got some type of a subset of characters that belonged to that alphabet, and now we want to classify them.

  • And so, for classification, we will have the following.

  • We've got that tea, which I'm using as an example here.

  • And as I said, we may have a font that defines a tea like this.

  • The tea like this, a t that looks like this a maybe a more graphical t that looks like this, even a T if we've got to have to consider small teas at the same time, and what we'll do is some kind of a match against these either pattern matching based on a training set center, and we might get a figure of merit for each of these things that will vary depending on the goodness of fit for this in the model And as you can see from this, I put these in order.

  • My best match for this particular T Is this fun?

  • It doesn't guarantee that that was the font that was intended.

  • It may have even been handwritten, for example, on just hand written well enough to do a match.

  • But that is going to be now the candidate font that I've got there.

  • If I now extend this out to the whole word which I used, which was there, I may have the characters from each of these alphabets.

  • So now I've got an age.

  • I've got any twice and are in any.

  • I'm going to get a similar set of numbers from each of those and try to combine those two.

  • Figure out which fund family did.

  • So let's let's do a little example here.

  • Let's say that for the next one I got these data here, and I'm going to just make up some numbers for the point of illustration here.

  • For each of the rest of these again, these are just examples of what I got for the T h e r n e, and the goodness of fit I got for each of those models.

  • Now, what I can do is I can use population statistics out of this to find out which font it actually fits.

  • If I sum these up, which I'm gonna do here quickly in my head 3.85 for this match again, I would divide that by five, which would give me a value of 50.79 which I guess is my mean value for this.

  • We can do this for each of these and the only other ones.

  • It's pretty clear that it's not gonna be belonging to either these three classes.

  • So we compare it for this one real quickly, and I can see very quickly that this is above eight.

  • In fact, this comes out 2.82 And so this is actually my best match across those fond families, even though I had for the tea The best match here, The best overall match here from the population of characters that I've got tells me that this is the fund.

  • And so again, this is a good example of what OCR does.

  • It accumulates statistics.

  • It learns the more and more characters that you bring in both to the training set and into the test that gives you much better accuracy.

  • And because of this, I'm able to get a better font and I got off of the single character.

  • So it's just a simple example.

  • That kind of shows some math behind it.

  • And this is what we do with Hosea our engines.

  • Now you can make magnify the complexity many fold by saying, Oh, in addition, I'm gonna have to get these statistics to see which language it belongs to.

  • I'm gonna have to get this for all the possible Foncier out there.

  • I might even have it for handwriting, and I might also have it for a hybrid of two languages.

  • So, for example, think of ah, novel by Dostoevsky.

  • If you read, let's say the Possessed the Possessed must have about 50 pages of French in with whatever the Russians been translated into.

  • So if you read in English and English rendition of The Possessed and it's been translated the way Dostoevsky intended it, you might have 50 pages of French in there.

  • We're not going to do very well with OCR on that if we're just translating into English and so there's a lot of stages of complexity that go with OCR.

  • And I just site that to give you some appreciation for how complex the classification these folks are doing.

  • This message is encrypted wants With K three.

  • It's encrypted another time with K two, but it's encrypted a final time with Kay one.

  • Now let's think about what that means.

no si r is optical character recognition, and the key point about OCR is I think about it as a compression.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

光学式文字認識(OCR) - コンピュータマニア (Optical Character Recognition (OCR) - Computerphile)

  • 2 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語