Placeholder Image

字幕表 動画を再生する

  • I actually think that if we're going to do parsing it might be easiest not to

  • dive into compilers straight away, as some of you might want me to do, but to

  • actually just take a simple, very simple, subset of the English language and see

  • how we would parse that. And if you look up in a dictionary "what does `parsing' mean"

  • it says, essentially, one of the definitions is: " ... breaking up sentence

  • structure into its component parts under the direction of a grammar." so the

  • So the grammar is the rules of your language. Well, I have drawn up here a language

  • involving furry dice and robots and people, men and women, and cats and dogs

  • and they can 'bite' they can 'stroke' they can ... whatever. And I thought: that little

  • grammar, which I'll walk you through, is capable of generating between 200 and

  • 300 sentences and they are all valid English. But there's no way it

  • captures the spirit of English. The idea that your sentences have

  • subject-verb-object is pretty universal, I think, among what's called the

  • Indo-European languages. However, sometimes it's not always in the order

  • of subject-verb-object, although that's common. Sometimes they get twisted up

  • particularly in German, where there's a definite option to have the verb last.

  • In this example I'm pointing to here, I'm using a notation that we did cover about

  • three or four years ago now. We'll put out a link.

  • I think the episode was called Angle Brackets. And they invented a notation

  • that looks like this: pointy brackets. Yes, back in the very late 50s and early 60s

  • people who wanted to write compilers realized they had to write

  • down a formal specification of what were the rules in their language. And two

  • computer scientists actually won the Turing award - [the] ACM Turing Award - for their

  • notation and their development of a grammar

  • for the language Algol 60. And it uses angle brackets all over the place.

  • So, that was so-called Backus-Naur form. Those are the two computer scntists

  • who developed it. People really liked it. And I've used it here because it's

  • verbose enough to make some sense. I'll read it out to you. What I'm saying, up at

  • the top of the grammar - using the angle brackets for what's going to become

  • parts of a so-called parse tree. A is defined as then a

  • then an . That's the most common form of English sentence the

  • subject is the person, or thing, doing the action. The verb says what the action is:

  • 'kicked' 'bit', 'stroked' ... whatever. The object says what it is doing it to. [e.g.] 'the man

  • kicked the robot' And it's such a powerful language! You can have either

  • 'the' or 'a' [as] definite or indefinite article. You can have 'the robot' or

  • 'a robot'. You can have a choice of verbs either 'bit', 'kicked' or 'stroked' - you're

  • going to be very expressive in this language (!) Your subject of the sentence

  • can either be 'dog', 'cat', 'man', 'woman' or 'robot'. And finally the tailpiece. The object

  • i've decided could either be any followed by a again, like 'a dog',

  • 'the robot bites a dog' [is] perfectly possible. But just to show that you can put in

  • shortcuts, if you like, and put bits of structure in there that aren't parsed -

  • they're just globbets you can put in - I have defined the phrase: 'two furry dice'

  • as being a valid 'object'. Just the string "two furry dice". So the target

  • sentence, that ultimately we're going to try and construct from all of this, is

  • the [in lower case] robot stroked [writes on paper] ... running out of space here, carry on on the line below ...

  • "the robot stroked two furry dice" We all agree it makes sense [but] is it parsable

  • against that set of rules? And how would you go about showing that it was?

  • >> Sean: So by 'parsable' you mean "can we decode it"? >> DFB: [Yes] can we decode it? Can we assign everything to

  • be the right flavour of structural component according to those [grammar] rules.

  • You can either start with a string, which we've got here, and start trying to group

  • things together and work upwards. Now we all know about trees - the root is at the

  • top. We're computer scientists, the root is at the top; the leaves are at the bottom.

  • They're always upside down. Do we start with and leaves and say:"Ooh! the

  • robot stroked ... The robot stroked ... Looking sideways at the grammar I can

  • see there's a shortcut for "the robot". So I'm gonna start coming from the string

  • upwards and see if I end up back at the root of the tree, or shall I start at the

  • root of the tree and say: "I must have a subject then a verb then an object". Can I

  • see anything that could likely be the start of a ? That's called coming

  • top-down. So it's been there in computer science ever since the dawn of computer

  • science: the top-down approach versus the bottom-up approach.

  • There's no prizes for guessing that functional languages like Haskell tend,

  • by their very structure, to favour the top-down approach. It's messier, is the

  • bottom-up approach. I shall try to indicate both, OK, but let's

  • just this once, let's go top down rather than bottom-up.

  • If we're coming top-down it is telling me, is this grammar, that I must have a

  • then a then an . Now, for those

  • of you who are concerned with a mechanistics -- you can actually think

  • of these things, if you're coming top down, as being pushed as targets, that

  • you've got to satisfy, on a stack which .... We all know about stacks! This one is not

  • building upwards and downwards it's building from right to left. So, at the

  • top of my stack currently, because I'm coming top down, I say my target my

  • sub-target is . I must find a . But I must keep half an eye on the target

  • string to say: "does it really match?" If we cross-refer now, to this grammar,

  • for inspiration, we're trying to match a If I look at the grammar over

  • there it tells me that a can either be an article and a noun like

  • "the" and "a" , with a like 'cat', 'dog', 'robot' or you can take a shortcut. Because it

  • tells you here that the just could be the phrase "the robot". So let's

  • go for broke. I'm looking at the input string and I see

  • "the robot". So let's take the shortcut just to be devils, right?! I can say

  • straight away then, taking the shortcut, that can be the string

  • "the robot". That's good! Now, having satisfied , I pop it

  • off the top of my stack. I've done that. Next thing, to match this sentence,

  • you've got to be able to match the . Well, we're going along the string we've coped

  • with "the robot". Next word - next valid leaf word - that has to match up with being a

  • . I'm looking at 'stroked'. Is 'stroked' a possibility? Look at . What are the

  • possibilities for ? And I see 'bit' 'kicked' or 'stroked'. Great! I choose 'stroked'.

  • But, once again, if you can imagine the stack I then pop that off the top of the stack

  • -- we've done it! The stack is shrinking! all we've got left to identify

  • is an that matches [in] that sentence. Again, I've pulled a fast one here! In the

  • rules for I said you can either break it down into an and a ,

  • just like the , so nice having the action done to it. Or you can take a

  • whizzy short-cut. An allowed shortcut, for an is what you might call an

  • and I put it in deliberately, just to make a point: "two furry dice" Wonderful!

  • Simplest parse going: an can be "two furry dice". We declare success and we say

  • we have matched "the robot strokes two furry dice" by going .

  • came down into being shortcut "the robot". developed straight away

  • into being "stroked". shortcut to It's a very shallow tree, really.

  • But it's developed the target sentence: "the robot stroked two furry dice".

  • Now, the only thing you might say: "Well, yes, wonderful, we've parsed it. But would

  • it have worked if the instead of taking the shortcut. Could you have

  • done it with the option?" Yes, you could.

  • I could have basically, come here: let's write out:

  • and, yes, right at the beginning instead of taking the shortcut phrase

  • "the robot", I could have said: "Ah! but I'm going to do it as can be an

  • followed by a ". So, the head of the stack then would be - when we're

  • looking at the sentence and starting off from the beginning again - it's got to be

  • an to start with. is 'a' or 'the'. Oh! I'm looking at the string, I see "the".

  • Bingo! So, article can be "the" - no problem.

  • Pop that off the stack. What's the next thing I've got to look for? . Can be "robot",

  • which is the next thing I'm looking at? Yes, "robot" is an option, so I could have done

  • "robot". Now the is as before. It's got to be "stroked". And "stroked" is

  • there. And equally the is as before. There's only one way to develop

  • the string: "two furry dice" and that's to take the declared short-cutk that does it for

  • you. The reason for doing this rather artificial example is to say: "Oh dear!

  • Does this matter? We have got a sentence that makes perfect sense to us: "the robot

  • stroked two furry dice". But you've drawn two slightly different trees. First time

  • you took a shortcut subject-phrase, just for "the robot". Next time you said: "Ah! but

  • we could also do it by breaking it down further into a 'the' and 'robot'. Does that

  • matter? In this particular case, in informal spoken English, no it doesn't.f

  • If people understand you the fact that your grammar is technically called 'ambiguous'

  • - so if you say to yourself what does 'ambiguous' mean? It means there's more

  • than one parse tree that seems to satisfy and give the final target phrase.

  • Does that matter? In this case 'no'. Does it matter in general? Yes, because you might

  • get very different effects under certain circumstances. What I think I'll show you

  • is a favourite one almost. Which is to check your 4-function desk calculator.

  • When you start typing things into a 4-function (so-called) hand-calculator.

  • You're basically putting in, shall we say, integer numbers - let's simplify it.

  • But the burning question is something like this: that if you type in 8 * 4 * 2

  • You may not think of it in those terms but computer scientists

  • would say: "Yes ,if you've got the right grammar rules that's an example of a

  • "sentence". That is a - what might you call it? - a sort

  • of subsection of a long calculation, that you do on the right-hand side [say] of a C statement.

  • It's just 8 times 4 times 2. There's no problem is there?. Eight 4s are 32.

  • Two 32s are 64. But then you say: "Ah! but we're into tree structures now. How would

  • you parse that sentence, given that multiply can only multiply two things

  • at a time. So, are you going to arrange it so that your overall expression groups

  • as 8 * 4, done on the left. Then you do the multiply by 2, which is

  • that tree structure. So, the question is you've got a little bit of depth on your

  • tree structure here. It's gone down a level whereas finally you've got one [operand]

  • up at the top level. Do you want that sub-structure to be on the left or on the right?

  • Because it would be equally OK, as we've got things at the moment, for

  • you to draw something that's got 8 here, multiplied by here 4 * 2 with a

  • multiplier between. So have we got a little bits of tree structure on the

  • left or on the right? And does it matter? And you say no it doesn't matter. We all

  • know that you can't complete the expression until it's left operand

  • is known and that involves forcing you to do the multiply. 8 * 4 is 32.

  • Fine, 32. What's the other operand here? It's multiply by 2: 32 * 2 = 64.

  • Equally, over here, 8 x (four twos are 8; eight 8s are 64) So, you're doing it two different ways

  • but you're getting the same answer. And the reason that's OK

  • is that multiply is well-behaved [over the integers] Mathematicians will tell you it commutes

  • a * b is the same as b * a for integers [and * is also associative over the integers]

  • However, just look at the ambiguity, OK, Two different parse trees and it

  • didn't matter for multiply. [But] just suppose those multipliers weren't multiplies but

  • were divides [ / ] . Does it then matter that you do 8 / 4 first in

  • this one? You get your 2 and the multiply is now replaced by a divide:

  • 2 / 2 is 1. But look at this other possibility and

  • just mentally substitute divides for multiplies. 8t divided by the result

  • of 4 / 2 ? Well 4 /2 is 2. 8 / 2 is 4.

  • Two completely different answers! Is it 1 or is it 4. If you're panicking take

  • out your calculator now and check the app. Just do 8 / 4 / 2 and

  • if the answer isn't 1 demand your money back!

  • Here we see it then. Here is an example of where ambiguity [two differing parse trees] does lead to problems,

  • because depending which tree you choose you get a different answer. So, there we

  • are then. We now know what a grammar is. We know that against a grammar you can

  • try and parse things like sentences but "sentences" [is] used in its widest sense.

  • It could be a program statement from C and just think of this: 8 * 4 * 2

  • has been as a part of a phrase on the right-hand side of an arithmetic statement.

  • And grammar is the guiding principle for the whole thing.

I actually think that if we're going to do parsing it might be easiest not to

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

構文解析の説明 - Computerphile (Parsing Explained - Computerphile)

  • 1 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語