字幕表 動画を再生する 英語字幕をプリント I actually think that if we're going to do parsing it might be easiest not to dive into compilers straight away, as some of you might want me to do, but to actually just take a simple, very simple, subset of the English language and see how we would parse that. And if you look up in a dictionary "what does `parsing' mean" it says, essentially, one of the definitions is: " ... breaking up sentence structure into its component parts under the direction of a grammar." so the So the grammar is the rules of your language. Well, I have drawn up here a language involving furry dice and robots and people, men and women, and cats and dogs and they can 'bite' they can 'stroke' they can ... whatever. And I thought: that little grammar, which I'll walk you through, is capable of generating between 200 and 300 sentences and they are all valid English. But there's no way it captures the spirit of English. The idea that your sentences have subject-verb-object is pretty universal, I think, among what's called the Indo-European languages. However, sometimes it's not always in the order of subject-verb-object, although that's common. Sometimes they get twisted up particularly in German, where there's a definite option to have the verb last. In this example I'm pointing to here, I'm using a notation that we did cover about three or four years ago now. We'll put out a link. I think the episode was called Angle Brackets. And they invented a notation that looks like this: pointy brackets. Yes, back in the very late 50s and early 60s people who wanted to write compilers realized they had to write down a formal specification of what were the rules in their language. And two computer scientists actually won the Turing award - [the] ACM Turing Award - for their notation and their development of a grammar for the language Algol 60. And it uses angle brackets all over the place. So, that was so-called Backus-Naur form. Those are the two computer scntists who developed it. People really liked it. And I've used it here because it's verbose enough to make some sense. I'll read it out to you. What I'm saying, up at the top of the grammar - using the angle brackets for what's going to become parts of a so-called parse tree. A is defined as then a then an . That's the most common form of English sentence the subject is the person, or thing, doing the action. The verb says what the action is: 'kicked' 'bit', 'stroked' ... whatever. The object says what it is doing it to. [e.g.] 'the man kicked the robot' And it's such a powerful language! You can have either 'the' or 'a' [as] definite or indefinite article. You can have 'the robot' or 'a robot'. You can have a choice of verbs either 'bit', 'kicked' or 'stroked' - you're going to be very expressive in this language (!) Your subject of the sentence can either be 'dog', 'cat', 'man', 'woman' or 'robot'. And finally the tailpiece. The object i've decided could either be any followed by a again, like 'a dog', 'the robot bites a dog' [is] perfectly possible. But just to show that you can put in shortcuts, if you like, and put bits of structure in there that aren't parsed - they're just globbets you can put in - I have defined the phrase: 'two furry dice' as being a valid 'object'. Just the string "two furry dice". So the target sentence, that ultimately we're going to try and construct from all of this, is the [in lower case] robot stroked [writes on paper] ... running out of space here, carry on on the line below ... "the robot stroked two furry dice" We all agree it makes sense [but] is it parsable against that set of rules? And how would you go about showing that it was? >> Sean: So by 'parsable' you mean "can we decode it"? >> DFB: [Yes] can we decode it? Can we assign everything to be the right flavour of structural component according to those [grammar] rules. You can either start with a string, which we've got here, and start trying to group things together and work upwards. Now we all know about trees - the root is at the top. We're computer scientists, the root is at the top; the leaves are at the bottom. They're always upside down. Do we start with and leaves and say:"Ooh! the robot stroked ... The robot stroked ... Looking sideways at the grammar I can see there's a shortcut for "the robot". So I'm gonna start coming from the string upwards and see if I end up back at the root of the tree, or shall I start at the root of the tree and say: "I must have a subject then a verb then an object". Can I see anything that could likely be the start of a ? That's called coming top-down. So it's been there in computer science ever since the dawn of computer science: the top-down approach versus the bottom-up approach. There's no prizes for guessing that functional languages like Haskell tend, by their very structure, to favour the top-down approach. It's messier, is the bottom-up approach. I shall try to indicate both, OK, but let's just this once, let's go top down rather than bottom-up. If we're coming top-down it is telling me, is this grammar, that I must have a then a then an . Now, for those of you who are concerned with a mechanistics -- you can actually think of these things, if you're coming top down, as being pushed as targets, that you've got to satisfy, on a stack which .... We all know about stacks! This one is not building upwards and downwards it's building from right to left. So, at the top of my stack currently, because I'm coming top down, I say my target my sub-target is . I must find a . But I must keep half an eye on the target string to say: "does it really match?" If we cross-refer now, to this grammar, for inspiration, we're trying to match a If I look at the grammar over there it tells me that a can either be an article and a noun like "the" and "a" , with a like 'cat', 'dog', 'robot' or you can take a shortcut. Because it tells you here that the just could be the phrase "the robot". So let's go for broke. I'm looking at the input string and I see "the robot". So let's take the shortcut just to be devils, right?! I can say straight away then, taking the shortcut, that can be the string "the robot". That's good! Now, having satisfied , I pop it off the top of my stack. I've done that. Next thing, to match this sentence, you've got to be able to match the . Well, we're going along the string we've coped with "the robot". Next word - next valid leaf word - that has to match up with being a . I'm looking at 'stroked'. Is 'stroked' a possibility? Look at . What are the possibilities for ? And I see 'bit' 'kicked' or 'stroked'. Great! I choose 'stroked'. But, once again, if you can imagine the stack I then pop that off the top of the stack -- we've done it! The stack is shrinking! all we've got left to identify is an that matches [in] that sentence. Again, I've pulled a fast one here! In the rules for I said you can either break it down into an and a , just like the , so nice having the action done to it. Or you can take a whizzy short-cut. An allowed shortcut, for an is what you might call an and I put it in deliberately, just to make a point: "two furry dice" Wonderful! Simplest parse going: an can be "two furry dice". We declare success and we say we have matched "the robot strokes two furry dice" by going . came down into being shortcut "the robot". developed straight away into being "stroked". shortcut to It's a very shallow tree, really. But it's developed the target sentence: "the robot stroked two furry dice". Now, the only thing you might say: "Well, yes, wonderful, we've parsed it. But would it have worked if the instead of taking the shortcut. Could you have done it with the option?" Yes, you could. I could have basically, come here: let's write out: and, yes, right at the beginning instead of taking the shortcut phrase "the robot", I could have said: "Ah! but I'm going to do it as can be an followed by a ". So, the head of the stack then would be - when we're looking at the sentence and starting off from the beginning again - it's got to be an to start with. is 'a' or 'the'. Oh! I'm looking at the string, I see "the". Bingo! So, article can be "the" - no problem. Pop that off the stack. What's the next thing I've got to look for? . Can be "robot", which is the next thing I'm looking at? Yes, "robot" is an option, so I could have done "robot". Now the is as before. It's got to be "stroked". And "stroked" is there. And equally the is as before. There's only one way to develop the string: "two furry dice" and that's to take the declared short-cutk that does it for you. The reason for doing this rather artificial example is to say: "Oh dear! Does this matter? We have got a sentence that makes perfect sense to us: "the robot stroked two furry dice". But you've drawn two slightly different trees. First time you took a shortcut subject-phrase, just for "the robot". Next time you said: "Ah! but we could also do it by breaking it down further into a 'the' and 'robot'. Does that matter? In this particular case, in informal spoken English, no it doesn't.f If people understand you the fact that your grammar is technically called 'ambiguous' - so if you say to yourself what does 'ambiguous' mean? It means there's more than one parse tree that seems to satisfy and give the final target phrase. Does that matter? In this case 'no'. Does it matter in general? Yes, because you might get very different effects under certain circumstances. What I think I'll show you is a favourite one almost. Which is to check your 4-function desk calculator. When you start typing things into a 4-function (so-called) hand-calculator. You're basically putting in, shall we say, integer numbers - let's simplify it. But the burning question is something like this: that if you type in 8 * 4 * 2 You may not think of it in those terms but computer scientists would say: "Yes ,if you've got the right grammar rules that's an example of a "sentence". That is a - what might you call it? - a sort of subsection of a long calculation, that you do on the right-hand side [say] of a C statement. It's just 8 times 4 times 2. There's no problem is there?. Eight 4s are 32. Two 32s are 64. But then you say: "Ah! but we're into tree structures now. How would you parse that sentence, given that multiply can only multiply two things at a time. So, are you going to arrange it so that your overall expression groups as 8 * 4, done on the left. Then you do the multiply by 2, which is that tree structure. So, the question is you've got a little bit of depth on your tree structure here. It's gone down a level whereas finally you've got one [operand] up at the top level. Do you want that sub-structure to be on the left or on the right? Because it would be equally OK, as we've got things at the moment, for you to draw something that's got 8 here, multiplied by here 4 * 2 with a multiplier between. So have we got a little bits of tree structure on the left or on the right? And does it matter? And you say no it doesn't matter. We all know that you can't complete the expression until it's left operand is known and that involves forcing you to do the multiply. 8 * 4 is 32. Fine, 32. What's the other operand here? It's multiply by 2: 32 * 2 = 64. Equally, over here, 8 x (four twos are 8; eight 8s are 64) So, you're doing it two different ways but you're getting the same answer. And the reason that's OK is that multiply is well-behaved [over the integers] Mathematicians will tell you it commutes a * b is the same as b * a for integers [and * is also associative over the integers] However, just look at the ambiguity, OK, Two different parse trees and it didn't matter for multiply. [But] just suppose those multipliers weren't multiplies but were divides [ / ] . Does it then matter that you do 8 / 4 first in this one? You get your 2 and the multiply is now replaced by a divide: 2 / 2 is 1. But look at this other possibility and just mentally substitute divides for multiplies. 8t divided by the result of 4 / 2 ? Well 4 /2 is 2. 8 / 2 is 4. Two completely different answers! Is it 1 or is it 4. If you're panicking take out your calculator now and check the app. Just do 8 / 4 / 2 and if the answer isn't 1 demand your money back! Here we see it then. Here is an example of where ambiguity [two differing parse trees] does lead to problems, because depending which tree you choose you get a different answer. So, there we are then. We now know what a grammar is. We know that against a grammar you can try and parse things like sentences but "sentences" [is] used in its widest sense. It could be a program statement from C and just think of this: 8 * 4 * 2 has been as a part of a phrase on the right-hand side of an arithmetic statement. And grammar is the guiding principle for the whole thing.
B1 中級 構文解析の説明 - Computerphile (Parsing Explained - Computerphile) 3 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語