Placeholder Image

字幕表 動画を再生する

  • In what we've done so far we've ranged over quite an area, because just the act

  • of using the T-diagrams has forced us to address lots of interesting and

  • profound questions, over the years, about how compilers actually work. So we've

  • looked at the fact that we think of our programs being written in a high-level

  • language and the brain goes blurry. We neglect to think, all the time, that

  • although you they were written in C it doesn't execute directly. You have to

  • compile the C into binary. So, really, your beautiful program runs on a suitable

  • architecture in a suitable binary. It has an input, it has an output. Then we

  • started looking at more advanced things about ... saying: "Well if that's how a C

  • compiler works why not write another C compiler, in C, and then compile it with

  • the old compiler?" And I think we just about emerged from that with brain intact.

  • If you want to go back and revise some of this stuff before catching up

  • with what we're going to do today I would recommend the one [video] which we'll put a

  • link out to: 'Self-compiling Compilers". There's another one we put out

  • also, on the UNCOL problem which was this business about: "Is there a common

  • universal intermediate code?" There's more of a consensus now than there used to be.

  • And the LLVM system is a good example. It won the Turing ACM Award for - not quite

  • for 'best intermediate code ever' - but you know ... The reason, I think, that it became

  • less of a problem - not totally solvable but less of a problem - thinking

  • about this the other night, is that actually, over the years, certain things

  • about computer architectures have converged and moved together to

  • consensus. And principally it seems to me is the idea that your unit of discourse

  • isn't just the bit, it's the byte. And the idea of having, y'know, 8-bit ASCII

  • characters fit into a byte. The idea of being able to 'glue two bytes together' to

  • make a 16-bit entity, or four to make a 32-bit entity.

  • That has become more and more and more prevalent. The reason why in some of the

  • other videos I've said: "Oh! you know there was such massive differences between

  • machines" There were! In the 1970s. I can quote [this] to you as a fact. Because there I was

  • at the University of Nottingham working on a machine with 24-bit words. Not byte

  • addressed at all. What did you put inside your 24-bit words if you were mad keen on

  • characters? Four 6-bit characters. How did you dig them out from that word?

  • With great difficulty(!) The word-addressed machine would give you the whole word.

  • It was your responsibility with bit-shift operations and a garden spade (more or

  • less!) to dig out four 6-bit characters. Non-standard, not 8-bit characters.

  • And everybody said: "Ah well, it's all right for IBM but it's so much more *expensive* building

  • byte-addressed machines!" But in the end it prevailed. And I do believe that things

  • like that about the fundamental way you address memory and being able to, sort of,

  • put units together, preferably in powers of two not multiples of two.

  • So, right at the beginning I had a 24-bit word; Atlas had a 48-bit word;

  • DEC 10s, I think, had a 36-bit word. And Seymour Cray, CDC, had a 60 bit word. All of those

  • are multiples of 2, but I don't think any of them, that I've just quoted, are

  • powers of 2. And it really mattered. it turned out to have a power-of-two basic

  • unit - bigger than a bit - 8-bit bytes. So, anyway, I use that to introduce the

  • idea of intermediate codes but we're now I think - in tying things up - in a

  • situation to revisit that now and say: "Intermediate codes really are useful".

  • and come into their own, if you like, with the idea of wanting to port a compiler from

  • one architecture, perhaps to a very very different architecture. What I call the

  • 'semantic gap' between your high-level language, and your program eventually

  • that runs on binary, is HUGE! It's not so huge in C, you can

  • feel the assembler and the binary poking up through the C, But you start

  • trying to do a Haskell interprete,r or compiler, and you'll soon discover that

  • this thing running down here is - so to speak - miles away from the abstract stuff

  • you were writing up at the top. So, everybody started saying don't we really

  • need intermediate codes to help us bridge the gap?

  • Hence Z-code, Bytecode for Java. All these kind of things became discussed

  • more and more and more. And I think I, at one stage, said: "Don't imagine it's always

  • fairly close to the hardware". You could end up in a situation where C is your

  • intermediate code. Bjarne Stroustrup got a C++ compiler started by - in

  • its early days - just making it produce C, which you know you can cope with.

  • What I want to have a look at, today, is this whole business of: How do intermediate

  • codes help you port a compiler from one architecture to another?" And you've got

  • to remember that, in the worst case, those machines and architectures could be very

  • very different indeed. I did an example, I think, of running a C compiler on a PDP-11

  • in PDP-11 binary, whose cross-compilation effect was to spit out

  • Z80 binary, which is very very different. How can you cope, then, with

  • cross-compilation? And how does cross-compilation lead you on to being

  • able to think about porting your compiler? Not just producing code for a

  • foreign machine but, in a way to mount an invasion of the foreign machine

  • and to say: "I'm not just going to push boatloads of code over, I'm going to set up

  • a bridgehead. We're going to land and I'm going to set up a lot of my software tools

  • on the far machine - not just fling raw binary at it.

  • So let's start to discover a bit more about this then. But in order to get into

  • the details I have, at great personal mental expense, made myself something

  • like 40 or 50 T-diagram blanks and let us hope that those prove sufficient

  • for the task. I just wanted to talk you through this, first of all, as being the

  • basis for what we're going to discuss and then I'll put it to one side. But

  • I'll bring it back if I need to refer to it again. We are getting in to a land of

  • intermediate codes. I've glued four things to the page. What are they?

  • Well, these two, at the top left, are what I will call Source Texts for your cross compilation /

  • compiler porting efforts. What you're saying is: from now on we

  • don't compile directly from H, the high-level language, to produce binary, B,

  • in the code generator. We do it in two steps. We have an H compiler written in H

  • producing I, the intermediate code, which I see is not on this list. Let's add it:

  • "I = Intermediate Code". Now, on the left at the top are the source texts for

  • doing this. But, if you consult the previous things we have done on

  • compilation you will understand that it's not directly those that we can use

  • because we can't directly execute H. We have to put these through an H compiler.

  • And we end up with the binary executable versions of them. Now, a little bit of

  • extra notation here. If you see at the bottom, for this executable B', I

  • use a single dash (prime) to mean: "The computer I currently possess, my old computer, the

  • one I do all my work on". If you see things like B'' that means the

  • new machine that I'm trying to port things to. So, we'll eventually get to

  • that stage of getting stuff across and you'll see B'' is appearing.

  • Inevitably, if you go this route, your compilation down to binary is now a

  • two-stage process. It may be hidden from you but it has

  • to be there. The other half, you see, is [that] if you're only going half way, to intermedia

  • code, the other half of the journey is to go from intermediate code down to

  • runnable binary for the whole thing. There's your intermediate code

  • interpreter, or compiler, written in B' running B', producing B'. So, I've

  • numbered these 1, 2, 3 and 4 and eventually I'll come back and say:

  • "We've now created a new number 3", or something like that. What I'm going to do

  • in this. Whereas in previous episodes I've talked about recoding a C compiler

  • to make better binary come out I did that previously by calling it 'subscript

  • B for better', but I've decided now to use an asterisk. If you see an asterisk

  • somewhere it means it's a 'better' version of what went before. And I is

  • 'intermediate code'. So, let's put that to one side to be dragged back as and when

  • we need it. When you write a program you have in mind a certain input, it executes

  • and it produces a certain form of output. And you're very happy. And it all works

  • beautifully. Rather than writing 'C', down here, as I have been doing all along, I'll

  • try and generalize it a bit, say to some high-level language (H) that you're

  • confident with and have used for ages. But all the while you know. So this is, if

  • you like, 'Userprog' here you are. You know that H can't execute directly so you

  • rely on the fact that, bootstrapped up over several generations, we just happen

  • to have an H compiler that runs in B' and produces B'. By slotting that into

  • there [uses T-diag template] you remember - there's an implicit arrow there. That's H feeds into that one there

  • and the transformation done in this compiler is to take the H and convert

  • into B', with a compiler that is now an executable binary itself. So, the net

  • result of all that, which we'll show up here - arrows - is what that makes, once

  • you've compiled is of course something that has your treasured input and output

  • but is running, H running on B' produces B'. So, that is the binary executable

  • version of your program. What happens if your input and output was H itself?

  • Can you write a compiler in itself? Of course you can! It's what all this is about. You want

  • to produce an H compiler. We'll start off by writing an H compiler in H. So we'll

  • put this back to one side now these two [T-shapes]. What happens if I were to say: "Well, I've

  • written an H compiler in H and it produces B'*. What I'm saying is:

  • "I am fed up with plain old B' because it's slow and it's inefficient.

  • And it was wonderful when I first did it and actually this thing up here has been

  • bootstrapped up through being written in assembler and heaven knows what".

  • See previous episodes if that sentence doesn't make any sense to you. But now we

  • have got a situation [where] you want to improve the quality of your binary so here's a

  • bit of revision. What you do is you say: "OK I'll write a better C compiler - better

  • in the sense of 'better quality binary' ". I feed it to the old compiler that we've

  • got working already, which takes in H runs on B', produces B'.

  • Anbd what does that end you up with? Answer: an H producing binary. It's running on

  • old binary that's not super quality but it's ok it doesn't crash. It's a bit slow.

  • Just to end this little exercise off, before we get on to genuine porting compilers.,

  • You've got that [points at T-diagram] and you naturally then say: "Well why not feed it

  • to itself again?" And if you get another version of that, feed one into the other,

  • you can end up with H written in better binary, producing better binary. And if

  • you look up [the video] "Self-compiling Compilers" that's exactly what we do. So it's all

  • very well doing this simple-minded stuff. We take one great flying leap in our [previous]

  • compilers and require the code generator to be able to get very low-level very

  • specific and yet be very very tight and wonderful for all sorts

  • of different binaries for different machines B', B'' etc.

  • That's hard. Sso we've decided that intermediate codes might be the answer.

  • So how do we cope then, using intermediate codes just with this

  • business of improving your code? How would you do it? Well it has to be a

  • two-stage process, no question about that. It has to be. So, this time I will do an H

  • compiler, written in H. And this time I am going to make it produce better

  • intermediate code than ever before. So you see ... think of it this way. When you

  • upgrade a compiler you've now got two halves to upgrade. Do you want to upgrade

  • the H-to-intermediate-code piece, the front end, so-called. Or do you want to

  • upgrade the intermediate code interpreter, going down to binary, the

  • back-end? It's a mix and match. You can do either, or both, or in whatever

  • order you like. But you've got to remember it is a two-stage process.

  • Fortunately for me I have, already working, an old compiler for H, running on

  • B' - original machine binary - and producing

  • ordinary intermediate code. Not "super* intermediate code. I've written a better

  • version of myself now and I'm upgrading this front-end. So we've got H written in

  • H producing I* - better-quality intermediate code in some sense - than went before.

  • But the old version of the compiler I've been using for months now

  • just has H running on B' producing ordinary intermediate code - not optimized.

  • But it's good enough for compiling this one. The only thing that's different from

  • what we've got before is that we don't directly end up producing binary as the output.

  • We produce intermediate code as the output. So this first stage, then, will

  • get me the following: H feeds into H running on B' produces I, means that

  • this thing takes in H produces I* and

  • is running on I. OK fine. So we've kind of 'recompiled the compiler'

  • but we haven't gone far enough yet. And this is the the ball and chain

  • around your ankle when you go for intermediate codes is you've got a

  • better thing but it is reliant on running - being able to cope with - the fact

  • that's an intermediate code stage. OK that doesn't put us off. What I've said,

  • all along, is that these are my key source texts these are my executables

  • and what I'm pointing out now, is, if you use intermediate codes you don't just

  • need a source-end translator you need a back-end translator. You need a thing

  • that says, you know: "My system produces intermediate code but I am the back

  • end that takes in intermediate code and produces real binary that really will run."

  • So, I want one of those now [points at T-shape (3)] to put into my diagram. This is what we're at so far.

  • Let me gobble up yet another T-diagram

  • and to say: "If I put in here what everybody ought to have available to them,

  • which is an I producing B' written in B', I can now slot that in

  • there like that. And just look what happens. You've got your first-stage

  • compilation. It's doing wonderful things but it's executing intermediate code.

  • And maybe I have a test interpreter that sort of, to show you: "Well that's kind of working."

  • But, in the end, you want faster, more efficient, code. So you decide you

  • will compile your intermediate code into proper binary. And this kind of choice

  • between: "Do I interpret it slowly and see if it's working versus do I in the end

  • commit to compiling it?" is the sort of thing available in stuff like Java bytecode and so on.

  • Start off by interpreting - feels a bit slow but it's looking good,

  • let's compile it. Now watch what happens as you trace through here. I goes in here,

  • this is the intermediate code compiler, if you like, now producing genuine binary.

  • That shoots through there and what you end up with

  • is H producing I* i.e. r better-quality intermediate code, running on B'.

  • Right, well we're getting close. There's one final thing, though, that you need to do.

  • What we've got ourselves now is a situation where we've got a compiler

  • that takes in statements in H. It is running on B' binary, ordinary

  • unimproved binary. but it does produce better intermediate code. We're now going

  • to pull the same trick that we've done before in Self-compiling Compilers.

  • The only difference here. though. is we've got this I to cope with this time. But the

  • principles of what we're doing are just the same. If I bring this one [T shape] down to

  • stop my having to write out another one, we've now made a binary out of that, feed

  • it to itself. Originally I started with H I* to H. I compiled it all the way

  • through to intermediate code, as best I could. It's good quality intermediate

  • code. Now take that thing, your executable, feed the original thing to itself and

  • look what happens! The H goes in there and produces I*. So. you end up with H [producing]

  • I* [written in] I*. Final stage coming up, I promise. So that, if you like, when you're

  • through its machinations produces that. One final stage - and I promise the

  • torture will end - right! Now you might say: "There - no big deal. I can write an

  • intermediate code interpreter and check out rough-and-ready. It'll be slow but [we can]

  • check out whether everything works". But then somebody in the end says: "No - come on

  • let's compile it - it might go a lot faster." So if you remember item number 4

  • in our original toolkit was a thing that takes I and turns it into binary. So I'm

  • going to put one of those in place now for that all the way through. What will that

  • produce you? Don't forget we're starting H producing good quality intermediate

  • code so that is what it produces but what about here? If you trace through,

  • better quality intermediate code should hopefully give you better quality binary

  • when you compile it. So I can put down here that this is basically - if there

  • aren't too many suffixes and superscripts there - it's producing you a

  • H to I compiler but running on better quality binary. So one way or another

  • this is the new 3 because, referring back to our original map, what i've said is

  • how can i improve that and i have managed to improve it. I've got better

  • quality intermediate code and better quality binary. It is improved but it has

  • been a fairly messy two-stage process that's hidden from you. But the idea of

  • that is to give you just the notion of the fact that there is a penalty to pay,

  • you have to have a two-stage process. So we've expended a lot of brain cells I

  • think between us now discovering how we can improve our compiler, which we did do

  • before in Self compiling Compilers episode, but this time it's different.

  • How can you improve your compiler when there's an intermediate code involved?

  • And we've done it! And we've seen exactly how we can do it. We feel weary - or at

  • least I do - but that is it. And you might say: "Why bother with intermediate codes? It's just

  • producing more stages that we have to go through?" Well, the answer us I think

  • that it does make life more easy for you if you say: "OK, instead of improving

  • ourselves now within an intermediate code context how about we say 'I don't

  • want better binary out of it - I want different binary for another machine' !

  • So, we will find that the diagrams that I've just drawn with some subtle adaptation

  • - and me getting all tangled up probably - can be adapted for producing binary for

  • a completely new architecture which we will be calling B''.

  • We're now going to find in this next one, we're doing is we're just changing the

  • rules slightly. Instead of B* - improved binary - we're moving from B' to B''.

In what we've done so far we've ranged over quite an area, because just the act

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

中級者向けコーデの上達法 - コンピュータマニア (Improving Intermediate Codes - Computerphile)

  • 0 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語