字幕表 動画を再生する
PROFESSOR: Well, last time Gerry really let the cat out
of the bag.
He introduced the idea of assignment.
Assignment and state.
And as we started to see, the implications of introducing
assignment and state into the language are absolutely
frightening.
First of all, the substitution model of
evaluation breaks down.
And we have to use this much more complicated environment
model and this very mechanistic thing with
diagrams, even to say what statements in the programming
language mean.
And that's not a mere technical point.
See, it's not that we had this particular substitution model
and, well, it doesn't quite work, so we have to do
something else.
It's that nothing like the substitution model can work.
Because suddenly, a variable is not just something that
stands for a value.
A variable now has to somehow specify a place
that holds a value.
And the value that's in that place can change.
Or for instance, an expression like f of x might have a side
effect in it.
So if we say f of x and it has some value, and then later we
say f of x again, we might get a different value
depending on the order.
So suddenly, we have to think not only about values but
about time.
And then things like pairs are no longer just their CARs and
their CDRs.
A pair now is not quite its CAR and its CDR. It's rather
its identity.
So a pair has identity.
It's an object.
And two pairs that have the same CAR and CDR might be the
same or different, because suddenly we have to worry
about sharing.
So all of these things enter as soon as we introduce
assignment.
See, this is a really far cry from where we started with
substitution.
It's a technically harder way of looking at things because
we have to think more mechanistically about our
programming language.
We can't just think about it as mathematics.
It's philosophically harder, because suddenly there are all
these funny issues about what does it mean that something
changes or that two things are the same.
And also, it's programming harder, because as Gerry
showed last time, there are all these bugs having to do
with bad sequencing and aliasing that just don't exist
in a language where we don't worry about objects.
Well, how'd we get into this mess?
Remember what we did, the reason we got into this is
because we were looking to build modular systems. We
wanted to build systems that fall apart into chunks that
seem natural.
So for instance, we want to take a random number generator
and package up the state of that random number generator
inside of it so that we can separate the idea of picking
random numbers from the general Monte Carlo strategy
of estimating something and separate that from the
particular way that you work with random numbers in that
formula developed by Cesaro for pi.
And similarly, when we go off and construct some models of
things, if we go off and model a system that we see in the
real world, we'd like our program to break into natural
pieces, pieces that mirror the parts of the system that we
see in the real world.
So for example, if we look at a digital circuit, we say,
gee, there's a circuit and it has a piece and
it has another piece.
And these different pieces sort of have identity.
They have state.
And the state sits on these wires.
And we think of this piece as an object that's different
from that as an object.
And when we watch the system change, we think about a
signal coming in here and changing a state that might be
here and going here and interacting with a state that
might be stored there, and so on and so on.
So what we'd like is we'd like to build in the computer
systems that fall into pieces that mirror our view of
reality, of the way that the actual systems we're modeling
seem to fall into pieces.
Well, maybe the reason that building systems like this
seems to introduce such technical complications has
nothing to do with computers.
See, maybe the real reason that we pay such a price to
write programs that mirror our view of reality is that we
have the wrong view of reality.
See, maybe time is just an illusion, and
nothing ever changes.
See, for example, if I take this chalk, and we say, gee,
this is an object and it has a state.
At each moment it has a position and a velocity.
And if we do something, that state can change.
But if you studied any relativity, for instance, you
know that you don't think of the path of that chalk as
something that goes on instant by instant.
It's more insightful to think of that whole chalk's
existence as a path in space-time.
that's all splayed out.
There aren't individual positions and velocities.
There's just its unchanging existence in space-time.
Similarly, if we look at this electrical system, if we
imagine this electrical system is implementing some sort of
signal processing system, the signal processing engineer who
put that thing together doesn't think of it as, well,
at each instance there's a voltage coming in.
And that translates into something.
And that affects the state over here, which changes the
state over here.
Nobody putting together a signal processing system
thinks about it like that.
Instead, you say there's this signal that's
splayed out over time.
And if this is acting as a filter, this whole thing
transforms this whole thing for some sort of other output.
You don't think of it as what's happening instant by
instant as the state of these things.
And somehow you think of this box as a whole thing, not as
little pieces sending messages of state to each other at
particular instants.
Well, today we're going to look at another way to
decompose systems that's more like the signal processing
engineer's view of the world than it is like thinking about
objects that communicate sending messages.
That's called stream processing.
And we're going to start by showing how we can make our
programs more uniform and see a lot more commonality if we
throw out of these programs what you might say is an
inordinate concern with worrying about time.
Let me start by comparing two procedures.
The first one does this.
We imagine that there's a tree.
Say there's a tree of integers.
It's a binary tree.
So it looks like this.
And there's integers in each of the nodes.
And what we would like to compute is for each odd number
sitting here, we'd like to find the square and then sum
up all those squares.
Well, that should be a familiar kind of thing.
There's a recursive strategy for doing it.
We look at each leaf, and either it's going to
contribute the square of the number if it's odd
or 0 if it's even.
And then recursively, we can say at each tree, the sum of
all of them is the sum coming from the right branch and the
left branch, and recursively down through the nodes.
And that's a familiar way of thinking about programming.
Let's actually look at that on the slide.
We say to sum the odd squares in a tree, well, there's a
test. Either it's a leaf node, and we're going to check to
see if it's an integer, and then either it's odd, in which
we take the square, or else it's 0.
And then the sum of the whole thing is the sum coming from
the left branch and the right branch.
OK, well, let me contrast that with a second problem.
Suppose I give you an integer n, and then some function to
compute of the first of each integer in 1 through n.
And then I want to collect together in a list all those
function values that satisfy some property.
That's a general kind of thing.
Let's say to be specific, let's imagine that for each
integer, k, we're going to compute
the k Fibonacci number.
And then we'll see which of those are odd and assemble
those into a list.
So here's a procedure that does that.
Find the odd Fibonacci numbers among the first n.
And here is a standard loop the way we've been writing it.
This is a recursion.
It's a loop on k, and says if k is bigger than n, it's the
empty list. Otherwise we compute the k-th Fibonacci
number, call that f.
If it's odd, we CONS it on to the list starting
with the next one.
And otherwise, we just take the next one.
And this is the standard way we've been
writing iterative loops.
And we start off calling that loop with 1.
OK, so there are two procedures.
Those procedures look very different.
They have very different structures.
Yet from a certain point of view, those procedures are
really doing very much the same thing.
So if I was talking like a signal processing engineer,
what I might say is that the first procedure enumerates the
leaves of a tree.
And then we can think of a signal coming out of that,
which is all the leaves.
We'll filter them to see which ones are odd, put them through
some kind of filter.
We'll then put them through a kind of transducer.
And for each one of those things, we'll take the square.
And then we'll accumulate all of those.
We'll accumulate them by sticking them together with
addition starting from 0.
That's the first program.
The second program, I can describe in a very, very
similar way.
I'll say, we'll enumerate the numbers on this interval, for
the interval 1 through n.
We'll, for each one, compute the Fibonacci number, put them
through a transducer.
We'll then take the result of that, and we'll
filter it for oddness.
And then we'll take those and put them into an accumulator.
This time we'll build up a list, so we'll accumulate with
CONS starting from the empty list.
So this way of looking at the program makes the two seem
very, very similar.
The problem is that that commonality is completely
obscured when we look at the procedures we wrote.
Let's go back and look at some odd squares again, and say
things like, where's the enumerator?
Where's the enumerator in this program?
Well, it's not in one place.
It's a little bit in this leaf-node test,
which is going to stop.
It's a little bit in the recursive structure of the
thing itself.
Where's the accumulator?
The accumulator isn't in one place either.
It's partly in this 0 and partly in this plus.
It's not there as a thing that we can look at.
Similarly, if we look at odd Fibs, that's also, in some
sense, an enumerator and an accumulator, but
it looks very different.
Because partly, the enumerator is here in this greater than
sign in the test. And partly it's in this whole recursive
structure in the loop, and the way that we call it.
And then similarly, that's also mixed up in there with
the accumulator, which is partly over there and partly
over there.
So these very, very natural pieces, these very natural
boxes here don't appear in our programs. Because they're kind
of mixed up.
The programs don't chop things up in the right way.
Going back to this fundamental principle of computer science
that in order to control something, you need the name
of it, we don't really have control over thinking about
things this way because we don't have our hands in them
explicitly.
We don't have a good language for talking about them.
Well, let's invent an appropriate language in which
we can build these pieces.
The key to the language is these guys, is what is these
things I called signals?
What are these things that are flying on the
arrows between the boxes?
Well, those things are going to be data structures called
streams. That's going to be the key to
inventing this language.
What's a stream?
Well, a stream is, like anything else, a data
abstraction.
So I should tell you what its selectors and
constructors are.
For a stream, we're going to have one constructor that's
called CONS-stream.
CONS-stream is going to put two things together to form a
thing called a stream.
And then to extract things from the stream, we're going
to have a selector called the head of the stream.
So if I have a stream, I can take its head or I
can take its tail.
And remember, I have to tell you George's contract here to
tell you what the axioms are that relate these.
And it's going to be for any x and y, if I form the
CONS-stream and take the head, the head of CONS-stream of x
and y is going to be x and the tail of CONS-stream of x and y
is going to be y.
So those are the constructor, two selectors for
streams, and an axiom.
There's something fishy here.
So you might notice that these are exactly the axioms for
CONS, CAR, and CDR. If instead of writing CONS-stream I wrote
CONS and I said head was the CAR and tail was the CDR,
those are exactly the axioms for pairs.
And in fact, there's another thing here.
We're going to have a thing called the-empty-stream, which
is like the-empty-list.
So why am I introducing this terminology?
Why don't I just keep talking about pairs and lists?
Well, we'll see.
For now, if you like, why don't you just pretend that
streams really are just a terminology for lists.
And we'll see in a little while why we want to keep this
extra abstraction layer and not just call them lists.
OK, now that we have streams, we can start constructing the
pieces of the language to operate on streams. And there
are a whole bunch of very useful things that we could
start making.
For instance, we'll make our map box to take a stream, s,
and a procedure, and to generate a new stream which
has as its elements the procedure applied to all the
successive elements of s.
In fact, we've seen this before.
This is the procedure map that we did with lists.
And you see it's exactly map, except we're testing for
empty-stream.
Oh, I forgot to mention that.
Empty-stream is like the null test. So if it's empty, we
generate the empty stream.
Otherwise, we form a new stream whose first element is
the procedure applied to the head of the stream, and whose
rest is gotten by mapping along with the procedure down
the tail of the stream.
So that looks exactly like the map procedure
we looked at before.
Here's another useful thing.
Filter, this is our filter box.
We're going to have a predicate and a stream.
We're going to make a new stream that consists of all
the elements of the original one
that satisfy the predicate.
That's case analysis.
When there's nothing in the stream, we
return the empty stream.
We test the predicate on the head of the stream.
And if it's true, we add the head of the stream onto the
result of filtering the tail of the stream.
And otherwise, if that predicate was false, we just
filter the tail of the stream.
Right, so there's filter.
Let me run through a couple more rather quickly.
They're all in the book and you can look at them.
Let me just flash through.
Here's accumulate.
Accumulate takes a way of combining things and an
initial value in a stream and sticks them all together.
If the stream's empty, it's just the initial value.
Otherwise, we combine the head of the stream with the result
of accumulating the tail of the stream starting from the
initial value.
So that's what I'd use to add up everything in the stream.
I'd accumulate with plus.
How would I enumerate the leaves of a tree?
Well, if the tree is just a leaf itself, I make something
which only has that node in it.
Otherwise, I append together the stuff of enumerating the
left branch and the right branch.
And then append here is like the ordinary append on lists.
You can look at that.
That's analogous to the ordinary procedure for
appending two lists.
How would I enumerate an interval?
This will take two integers, low and high, and generate a
stream of the integers going from low to high.
And we can make a whole bunch of pieces.
So that's a little language of talking about streams. Once we
have streams, we can build things for manipulating them.
Again, we're making a language.
And now we can start expressing
things in this language.
Here's our original procedure for summing the odd
squares in a tree.
And you'll notice it looks exactly now like the block
diagram, like the signal processing block diagram.
So to sum the odd squares in a tree, we enumerate the leaves
of the tree.
We filter that for oddness.
We map that for squareness.
And we accumulate the result of that using addition,
starting from 0.
So we can see the pieces that we wanted.
Similarly, the Fibonacci one, how do we get the odd Fibs?
Well, we enumerate the interval from 1 to n, we map
along that, computing the Fibonacci of each one.
We filter the result of those for oddness.
And we accumulate all of that stuff using CONS starting from
the empty-list.
OK, what's the advantage of this?
Well, for one thing, we now have pieces that we can start
mixing and matching.
So for instance, if I wanted to change this, if I wanted to
compute the squares of the integers and then filter them,
all I need to do is pick up a standard piece like this in
that square and put it in.
Or if we wanted to do this whole Fibonacci computation on
the leaves of a tree rather than a sequence, all I need to
do is replace this enumerator with that one.
See, the advantage of this stream processing is that
we're establishing--
this is one of the big themes of the course--
we're establishing conventional interfaces that
allow us to glue things together.
Things like map and filter are a standard set of components
that we can start using for pasting together programs in
all sorts of ways.
It allows us to see the commonality of programs.
I just ought to mention, I've only showed you two
procedures.
But let me emphasize that this way of putting things together
with maps, filters, and accumulators
is very, very general.
It's the generate and test paradigm for programs. And as
an example of that, Richard Waters, who was at MIT when he
was a graduate student, as part of his thesis research
went and analyzed a large chunk of the IBM scientific
subroutine library, and discovered that about 60% of
the programs in it could be expressed exactly in terms
using no more than what we've put here--
map, filter, and accumulate.
All right, let's take a break.
Questions?
AUDIENCE: It seems like the essence of this whole thing is
just that you have a very uniform, simple data structure
to work with, the stream.
PROFESSOR: Right.
The essence is that you, again, it's this sense of
conventional interfaces.
So you can start putting a lot of things together.
And the stream is as you say, the uniform data structure
that supports that.
This is very much like APL, by the way.
APL is very much the same idea, except in APL, instead
of this stream, you have arrays and vectors.
And a lot of the power of APL is exactly the same reason of
the power of this.
OK, thank you.
Let's take a break.
All right.
We've been looking at ways of organizing computations using
streams. What I want to do now is just show you two somewhat
more complicated examples of that.
Let's start by thinking about the following kind of utility
procedure that will come in useful.
Suppose I've got a stream.
And the elements of this stream are themselves streams.
So the first thing might be 1, 2, 3.
So I've got a stream.
And each element of the stream is itself a stream.
And what I'd like to do is build a stream that collects
together all of the elements, pulls all of the elements out
of these sub-streams and strings them all
together in one thing.
So just to show you the use of this language, how easy it is,
call that flatten.
And I can define to flatten this stream of streams. Well,
what is that?
That's just an accumulation.
I want to accumulate using append, by
successively appending.
So I accumulate using append streams, starting with
the-empty-stream down that stream of streams.
OK, so there's an example of how you can start using these
higher order things to do some interesting operations.
In fact, there's another useful thing
that I want to do.
I want to define a procedure called flat-map, flat map of
some function and a stream.
And what this is going to do is f will
be a stream of elements.
f is going to be a function that for each element in the
stream produces another stream.
And what I want to do is take all of the elements and all of
those streams and combine them together.
So that's just going to be the flatten of map f down s.
Each time I apply f to an element of s, I get a stream.
If I map it all the way down, I get a stream of streams, and
I'll flatten that.
Well, I want to use that to show you a new way to do a
familiar kind of problem.
The problem's going to be like a lot of problems you've seen,
although maybe not this particular one.
I'm going to give you an integer, n.
And the problem is going to be find all pairs and integers i
and j, between 0 and i, with j less than i, up to n, such
that i plus j is prime.
So for example, if n equals 6, let's make a little table
here, i and j and i plus j.
So for, say, i equals 2 and j equals 1, I'd get 3.
And for i equals 3, I could have j equals 2, and that
would be 5.
And 4 and 1 would be 5 and so on, up until i goes to 6.
And what I'd like to return is to produce a stream of all the
triples like this, let's say i, j, and i plus j.
So for each n, I want to generate this stream.
OK, well, that's easy.
Let's build it up.
We start like this.
We're going to say for each i, we're going
to generate a stream.
For each i in the interval 1 through n, we're going to
generate a stream.
What's that stream going to be?
We're going to start by generating all the pairs.
So for each i, we're going to generate, for each j in the
interval 1 to i minus 1, we'll generate the pair, or the list
with two elements i and j.
So we map along the interval, generating the pairs.
And for each i, that generates a stream of pairs.
And we flatmap it.
Now we have all the pairs i and j, such that i
is less than j.
So that builds that.
Now we're got to test them.
Well, we take that thing we just built, the flatmap, and
we filter it to see whether the i--
see, we had an i and a j.
i was the first thing in the list, j was the second thing
in the list. So we have a predicate which says in that
list of two elements is the sum of the
CAR and the CDR prime.
And we filter that collection of pairs we just built.
So those are the pairs we want.
Now we go ahead and we take the result of that filter and
we map along it, generating the list i and j and i plus j.
And that's our procedure prime-sum-pairs.
And then just to flash it up, here's the whole procedure.
A map, a filter, a flatmap.
There's the whole thing, even though this isn't
particularly readable.
It's just expanding that flatmap.
So there's an example which illustrates the general point
that nested loops in this procedure start looking like
compositions of flatmaps of flatmaps of flatmaps of maps
and things.
So not only can we enumerate individual things, but by
using flatmaps, we can do what would correspond to nested
loops in most other languages.
Of course, it's pretty awful to keep writing these flatmaps
of flatmaps of flatmaps.
Prime-sum-pairs you saw looked fairly complicated, even
though the individual pieces were easy.
So what you can do, if you like, is introduced some
syntactic sugar that's called collect.
And collect is just an abbreviation for that nest of
flatmaps and filters arranged in that particular way.
Here's prime-sum-pairs again, written using collect.
It says to find all those pairs, I'm going to collect
together a result, which is the list i, j, and i plus j,
that's going to be generated as i runs through the interval
from 1 to n and as j runs through the interval from 1 to
i minus 1, such that i plus j is prime.
So I'm not going to say what collect does in general.
You can look at that by looking at it in the book.
But pretty much, you can see that the pieces of this are
the pieces of that original procedure I wrote.
And this collect is just some syntactic sugar for
automatically generating that nest of flatmaps and flatmaps.
OK, well, let me do one more example that shows you the
same kind of thing.
Here's a very famous problem that's used to illustrate a
lot of so-called backtracking computer algorithms. This is
the eight queens problem.
This is a chess board.
And the eight queens problem says, find a way to put down
eight queens on a chess board so that no two are attacking
each other.
And here's a particular solution to the
eight queens problem.
So I have to make sure to put down queens so that no two are
in the same row or the same column or sit
along the same diagonal.
Now, there's sort of a standard way of doing that.
Well, first we need to do is below the
surface, at George's level.
We have to find some way to represent a board, and
represent positions.
And we'll not worry about that.
But let's assume that there's a predicate called safe.
And what safe is going to do is going to say given that I
have a bunch of queens down on the chess board, is it OK to
put a queen in this particular spot?
So safe is going to take a row and a column.
That's going to be a place where I'm going to try and put
down the next queen, and the rest of positions.
And what safe will say is given that I already have
queens down in these positions, is it safe to put
another queen down in that row and that column?
And let's not worry about that.
That's George's problem. and it's not hard to write.
You just have to check whether this thing contains any things
on that row or that column or in that diagonal.
Now, how would you organize the program given that?
And there's sort of a traditional way to organize it
called backtracking.
And it says, well, let's think about all the ways of putting
the first queen down in the first column.
There are eight ways.
Well, let's say try the first column.
Try column 1, row 1.
These branches are going to represent the possibilities at
each level.
So I'll try and put a queen down in the first column.
And now given that it's in the first column, I'll try and put
the next queen down in the first column.
I'll try and put the first queen, the one in the first
column, down in the first row.
I'm sorry.
And then given that, we'll put the next queen down
in the first row.
And that's no good.
So I'll back up to here.
And I'll say, oh, can I put the first queen down in the
second row?
Well, that's no good.
Oh, can I put it down in the third row?
Well, that's good.
Well, now can I put the next queen down
in the first column?
Well, I can't visualize this chess board anymore, but I
think that's right.
And I try the next one.
And at each place, I go as far down this tree as I can.
And I back up.
If I get down to here and find no possibilities below there,
I back all the way up to here, and now start again generating
this sub-tree.
And I sort of walk around.
And finally, if I ever manage to get all the way down, I've
found a solution.
So that's a typical sort of paradigm that's used a lot in
AI programming.
It's called backtracking search.
And it's really unnecessary.
You saw me get confused when I was visualizing this thing.
And you see the complication.
This is a complicated thing to say.
Why is it complicated?
Its because somehow this program is too inordinately
concerned with time.
It's too much--
I try this one, and I try this one, and I go back to the last
possibility.
And that's a complicated thing.
If I stop worrying about time so much, then there's a much
simpler way to describe this.
It says, let's imagine that I have in my hands the tree down
to k minus 1 levels.
See, suppose I had in my hands all possible ways to put down
queens in the first k columns.
Suppose I just had that.
Let's not worry about how we get it.
Well, then, how do I extend that?
How do I find all possible ways to put down queens in the
next column?
It's really easy.
For each of these positions I have, I think about putting
down a queen in each row to make the next thing.
And then for each one I put down, I filter those by the
ones that are safe.
So instead of thinking about this tree as generated step by
step, suppose I had it all there.
And to extend it from level k minus 1 to level k, I just
need to extend each thing in all possible ways and only
keep the ones that are safe.
And that will give me the tree to level k.
And that's a recursive strategy for solving the eight
queens problem.
All right, well, let's look at it.
To solve the eight queens problem on a board of some
specified size, we write a sub-procedure called
fill-columns.
Fill-columns is going to put down queens up
through column k.
And here's the pattern of the recursion.
I'm going to call fill-columns with the size eventually.
So fill-columns says how to put down queens safely in the
first k columns of this chess board with a size number of
rows in it.
If k is equal to 0, well, then I don't have to
put anything down.
So my solution is just an empty chess board.
Otherwise, I'm going to do some stuff.
And I'm going to use collect.
And here's the collect.
I find all ways to put down queens in the
first k minus 1 columns.
And this was just what I set for.
Imagine I have this tree down to k minus 1 levels.
And then I find all ways of trying a row, that's just each
of the possible rows.
They're size rows, so that's enumerate interval.
And now what I do is I collect together the new row I'm going
to try and column k with the rest of the queens.
I adjoin a position.
This is George's problem.
An adjoined position is like safe.
It's a thing that takes a row and a column and the rest of
the positions and makes a new position collection.
So I adjoin a position of a new row and a new column to
the rest of the queens, where the rest of the queens runs
through all possible ways of solving the problem
in k minus 1 columns.
And the new row runs through all possible rows such that it
was safe to put one there.
And that's the whole program.
There's the whole procedure.
Not only that, that doesn't just solve the eight queens
problem, it gives you all solutions to the
eight queens problem.
When you're done, you have a stream.
And the elements of that stream are all possible ways
of solving that problem.
Why is that simpler?
Well, we threw away the whole idea that this is some process
that happens in time with state.
And we just said it's a whole collection of stuff.
And that's why it's simpler.
We've changed our view.
Remember, that's where we started today.
We've changed our view of what it is we're trying to model.
we stop modeling things that evolve in time and have steps
and have state.
And instead, we're trying to model this global thing like
the whole flight of the chalk, rather than its
state at each instant.
Any questions?
AUDIENCE: It looks to me like backtracking would be
searching for the first solution it can find, whereas
this recursive search would be looking for all solutions.
And it seems that if you have a large enough area to search,
that the second is going to become impossible.
PROFESSOR: OK, the answer to that question is the whole
rest of this lecture.
It's exactly the right question.
And without trying to anticipate the lecture too
much, you should start being suspicious at this point, and
exactly those kinds of suspicions.
It's wonderful, but isn't it so terribly inefficient?
That's where we're going.
So I won't answer now, but I'll answer later.
OK, let's take a break.
Well, by now you should be starting to get suspicious.
See, I've showed your this simple, elegant way of putting
programs together, very unlike these other traditional
programs that sum the odd squares or compute the odd
Fibonacci numbers.
Very unlike these programs that mix up the enumerator and
the filter and the accumulator.
And by mixing it up, we don't have all of these wonderful
conceptual advantages of these streams pieces, these
wonderful mix and match components for putting
together lots and lots of programs.
On the other hand, most of the programs you've seen look like
these ugly ones.
Why's that?
Can it possibly be that computer scientists are so
obtuse that they don't notice that if you'd merely did this
thing, then you can get this great programming elegance?
There's got to be a catch.
And it's actually pretty easy to see what the catch is.
Let's think about the following problem.
Suppose I tell you to find the second prime between 10,000
and 1 million, or if your computer's larger, say between
10,000 and 100 billion, or something.
And you say, oh, that's easy.
I can do that with a stream.
All I do is I enumerate the interval
from 10,000 to 1 million.
So I get all those integers from 10,000 to 1 million.
I filter them for prime-ness, so test all of them and see if
they're prime.
And I take the second element.
That's the head of the tail.
Well, that's clearly pretty ridiculous.
We'd not even have room in the machine to store the integers
in the first place, much less to test them.
And then I only want the second one.
See, the power of this traditional programming style
is exactly its weakness, that we're mixing up the
enumerating and the testing and the accumulating.
So we don't do it all.
So the very thing that makes it conceptually ugly is the
very thing that makes it efficient.
It's this mixing up.
So it seems that all I've done this morning so far is just
confuse you.
I showed you this wonderful way that programming might
work, except that it doesn't.
Well, here's where the wonderful thing happens.
It turns out in this game that we really can have our cake
and eat it too.
And what I mean by that is that we really can write
stream programs exactly like the ones I wrote and arrange
things so that when the machine actually runs, it's as
efficient as running this traditional programming style
that mixes up the generation and the test.
Well, that sounds pretty magic.
The key to this is that streams are not lists.
We'll see this carefully in a second, but for now, let's
take a look at that slide again.
The image you should have here of this signal processing
system is that what's going to happen is there's this box
that has the integers sitting in it.
And there's this filter that's connected to it and it's
tugging on them.
And then there's someone who's tugging on this stuff saying
what comes out of the filter.
And the image you should have is that someone says, well,
what's the first prime, and tugs on this filter.
And the filter tugs on the integers.
And you look only at that much, and then say, oh, I
really wanted the second one.
What's the second prime?
And that no computation gets done except when you tug on
these things.
Let me try that again.
This is a little device.
This is a little stream machine invented by Eric
Grimson who's been teaching this course at MIT.
And the image is here's a stream of stuff, like a whole
bunch of the integers.
And here's some processing elements.
And if, say, it's filter of filter of map, or something.
And if I really tried to implement that with streams as
lists, what I'd say is, well, I've got this list of things,
and now I do the first filter.
So do all this processing.
And I take this and I process and I process and I process
and I process.
And now I'm got this new stream.
Now I take that result in my hand someplace.
And I put that through the second one.
And I process the whole thing.
And there's this new stream.
And then I take the result and I put it all the way through
this one the same way.
That's what would happen to these stream programs if
streams were just lists.
But in fact, streams aren't lists, they're streams. And
the image you should have is something a little
bit more like this.
I've got these gadgets connected up by this data
that's flowing out of them.
And here's my original source of the streams. It might be
starting to generate the integers.
And now, what happens if I want a result?
I tug on the end here.
And this element says, gee, I need some more data.
So this one comes here and tugs on that one.
And it says, gee, I need some more data.
And this one tugs on this thing, which might be a
filter, and says, gee, I need some more data.
And only as much of this thing at the end here gets generated
as I tugged.
And only as much of this stuff goes through the processing
units as I'm pulling on the end I need.
That's the image you should have of the difference between
implementing what we're actually going to do and if
streams were lists.
Well, how do we make this thing?
I hope you have the image.
The trick is how to make it.
We want to arrange for a stream to be a data structure
that computes itself incrementally, an on-demand
data structure.
And the basic idea is, again, one of the very basic ideas
that we're seeing throughout the whole course.
And that is that there's not a firm distinction between
programs and data.
So what a stream is going to be is simultaneously this data
structure that you think of, like the stream of the leaves
of this tree.
But at the same time, it's going to be a very clever
procedure that has the method of computing in it.
Well, let me try this.
It's going to turn out that we don't need any more mechanism.
We already have everything we need simply from the fact that
we know how to handle procedures
as first-class objects.
Well, let's go back to the key.
The key is, remember, we had these operations.
CONS-stream and head and tail.
When I started, I said you can think about this as CONS and
think about this as CAR and think about that as
CDR, but it's not.
Now, let's look at what they really are.
Well, CONS-stream of x and y is going to be an abbreviation
for the following thing.
CONS form a pair, ordinary CONS, of x to a thing called
delay of y.
And before I explain that, let me go and write the rest. The
head of a stream is going to be just the CAR.
And the tail of a stream is going to be a thing called
force the CDR of the stream.
Now let me explain this.
Delay is going to be a special magic thing.
What delay does is take an expression and produce a
promise to compute that expression
when you ask for it.
It doesn't do any computation here.
It just gives you a rain check.
It produces a promise.
And CONS-stream says I'm going to put together in a pair x
and a promise to compute y.
Now, if I want the head, that's just the CAR that I put
in the pair.
And the key is that the tail is going to be--
force calls in that promise.
Tail says, well, take that promise and now
call in that promise.
And then we compute that thing.
That's how this is going to work.
That's what CONS-stream, head, and tail really are.
Now, let's see how this works.
And we'll go through this fairly carefully.
We're going to see how this works in this example of
computing the second prime between 10,000 and a million.
OK, so we start off and we have this expression.
The second prime-- the head of the tail of the result of
filtering for primality the integers between
10,000 and 1 million.
Now, what is that?
What that is, that interval between 10,000 and 1 million,
well, if you trace through enumerate interval, there
builds a CONS-stream.
And the CONS-stream is the CONS of 10,000 to a promise to
compute the integers between 10,001 and 1 million.
So that's what this expression is.
Here I'm using the substitution model.
And we can use the substitution model because we
don't have side effects and state.
So I have CONS of 10,000 to a promise to compute the rest of
the integers.
So only one integer, so far, got enumerated.
Well, I'm going to filter that thing for primality.
Again, you go back and look at the filter code.
What the filter will first do is test the head.
So in this case, the filter will test 10,000 and say, oh,
10,000's not prime.
Therefore, what I have to do recursively
is filter the tail.
And what's the tail of it, well, that's the tail of this
pair with a promise in it.
Tail now comes in and says, well, I'm going to force that.
I'm going to force that promise, which means now I'm
going to compute the integers between 10,001 and 1 million.
OK, so this filter now is looking at that.
That enumerate itself, well, now we're back in the original
enumerate situation.
The enumerate is the CONS of the first thing, 10,001, onto
a promise to compute the rest.
So now the primality filter is going to go look at 10,001.
It's going to decide if it likes that or not.
It turns out 10,001 isn't prime.
So it'll force it again and again and again.
And finally, I think the first prime it hits is 10,009.
And at that point, it'll stop.
And that will be the first prime, and then eventually,
it'll need the second prime.
So at that point, it will go again.
So you see what happens is that no more gets generated
than you actually need.
That enumerator is not going to generate any more integers
than the filter asks it for as it's pulling in things to
check for primality.
And the filter is not going to generate any more stuff than
you ask it for, which is the head of the tail.
You see, what's happened is we've put that mixing of
generation and test into what actually happens in the
computer, even though that's not apparently what's
happening from looking at our programs.
OK, well, that seemed easy.
All of this mechanism got put into this magic delay.
So you're saying, gee, that must be where the magic is.
But see there's no magic there either.
You know what delay is.
Delay on some expression is just an abbreviation for--
well, what's a promise to compute an expression?
Lambda of nil, procedure of no arguments, which is that
expression.
That's what a procedure is.
It says I'm going to compute an expression.
What's force?
How do I take up a promise?
Well, force of some procedure, a promise, is just run it.
Done.
So there's no magic there at all.
Well, what have we done?
We said the old style, traditional style of
programming is more efficient.
And the stream thing is more perspicuous.
And we managed to make the stream procedures run like the
other procedures by using delay.
And the thing that delay did for us was to de-couple the
apparent order of events in our programs from the actual
order of events that happened in the machine.
That's really what delay is doing.
That's exactly the whole point.
We've given up the idea that our procedures, as they run,
or as we look at them, mirror some clear notion of time.
And by giving that up, we give delay the freedom to arrange
the order of events in the computation the way it likes.
That's the whole idea.
We de-couple the apparent order of events in our
programs from the actual order of events in the computer.
OK, well there's one more detail.
It's just a technical detail, but it's actually
an important one.
As you run through these recursive programs unwinding,
you'll see a lot of things that look like tail of the
tail of the tail.
That's the kind of thing that would happen as I go CONSing
down a stream all the way.
And if each time I'm doing that, each time to compute a
tail, I evaluate a procedure which then has to go
re-compute its tail, and re-compute its tail and
recompute its tail each time, you can see that's very
inefficient compared to just having a list where the
elements are all there, and I don't have to re-compute each
tail every time I get the next tail.
So there's one little hack to slightly change what delay is,
and make it a thing which is--
I'll write it this way.
The actual implementation, delay is an abbreviation for
this thing, memo-proc of a procedure.
Memo-proc is a special thing that transforms a procedure.
What it does is it takes a procedure of no arguments and
it transforms it into a procedure that'll only have to
do its computation once.
And what I mean by that is, you give it a procedure.
The result of memo-proc will be a new procedure, which the
first time you call it, will run the original procedure,
remember what result it got, and then from ever on after,
when you call it, it just won't have to do the
computation.
It will have cached that result someplace.
And here's an implementation of memo-proc.
Once you have the idea, it's easy to implement.
Memo-proc is this little thing that has two
little flags in there.
It says, have I already been run?
And initially it says, no, I haven't already been run.
And what was the result I got the last time I was run?
So memo-proc takes a procedure called proc, and it returns a
new procedure of no arguments.
Proc is supposed to be a procedure of no arguments.
And it says, oh, if I'm not already run, then I'm going to
do a sequence of things.
I'm going to compute proc, I'm going to save that.
I'm going to stash that in the variable result.
I'm going to make a note to myself that I've already been
run, and then I'll return the result.
So that's if you compute it if it's not already run.
If you call it and it's already been run, it just
returns the result.
So that's a little clever hack called memoization.
And in this case, it short circuits having to re-compute
the tail of the tail of the tail of the tail of the tail.
So there isn't even that kind of inefficiency.
And in fact, the streams will run with pretty much the same
efficiency as the other programs precisely.
And remember, again, the whole idea of this is that we've
used the fact that there's no really good dividing line
between procedures and data.
We've written data structures that, in fact, are sort of
like procedures.
And what that's allowed us to do is take an example of a
common control structure, in this place iteration.
And we've built a data structure which, since itself
is a procedure, kind of has this iteration control
structure in it.
And that's really what streams are.
OK, questions?
AUDIENCE: Your description of tail-tail-tail, if I
understand it correctly, force is actually execution of a
procedure, if it's done without this memo-proc thing.
And you implied that memo-proc gets around that problem.
Doesn't it only get around it if tail-tail-tail is always
executing exactly the same--
PROFESSOR: Oh, that's--
sure.
AUDIENCE: I guess I missed that point.
PROFESSOR: Oh, sure.
I mean the point is--
yeah.
I mean I have to do a computation to get the answer.
But the point is, once I've found the tail of the stream,
to get the tail of the tail, I shouldn't have had to
re-compute the first tail.
See, and if I didn't use memo-proc, that re-computation
would have been done.
AUDIENCE: I understand now.
AUDIENCE: In one of your examples, you mentioned that
we were able to use the substitution model because
there are no side effects.
What if we had a single processing unit--
if we had a side effect, if we had a state?
Could we still practically build the stream model?
PROFESSOR: Maybe.
That's a hard question.
I'm going to talk a little bit later about the places where
substitution and side effects don't really mix very well.
But in general, I think the answer is unless you're very
careful, any amount of side effect is going to mess up
everything.
AUDIENCE: Sorry, I didn't quite understand
the memo-proc operation.
When do you execute the lambda?
In other words, when memo-proc is executed, just this lambda
expression is being generated.
But it's not clear to me when it's executed.
PROFESSOR: Right.
What memo-proc does-- remember, the thing that's
going into memo-proc, the thing proc, is a procedure of
no arguments.
And someday, you're going to call it.
Memo-proc translates that procedure into another
procedure of no arguments, which someday
you're going to call.
That's that lambda.
So here, where I initially built as my tail of the
stream, say, this procedure of no arguments, which
someday I'll call.
Instead, I'm going to have the tail of the stream be
memo-proc of it, which someday I'll call.
So that lambda of nil, that gets called when you call the
memo-proc, when you call the result of that memo-proc,
which would be ordinarily when you would have called the
original thing that you set it.
AUDIENCE: OK, the reason I ask is I had a feeling that when
you call memo-proc, you just return this lambda.
PROFESSOR: That's right.
When you call memo-proc, you return the lambda.
You never evaluate the expression at all, until the
first time that you would have evaluated it.
AUDIENCE: Do I understand it right that you actually have
to build the list up, but the elements of the
list don't get evaluated?
The expressions don't get evaluated?
But at each stage, you actually are building a list.
PROFESSOR: That's--
I really should have said this.
That's a really good point.
No, it's not quite right.
Because what happens is this.
Let me draw this as pairs.
Suppose I'm going to make a big stream, like enumerate
interval, 1 through 1 billion.
What that is, is a pair with a 1 and a promise.
That's exactly what it is.
Nothing got built up.
When I go and force this, and say, what happens?
Well, this thing is now also recursively a CONS.
So that this promise now is the next thing, which is a 2
and a promise to do more.
And so on and so on and so on.
So nothing gets built up until you walk down the stream.
Because what's sitting here is not the list, but a promise to
generate the list. And by promise,
technically I mean procedure.
So it doesn't get built up.
Yeah, I should have said that before this point.
OK.
Thank you.
Let's take a break.