Placeholder Image

字幕表 動画を再生する

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality, educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • JULIAN SHUN: So welcome to the second lecture of 6.172,

  • performance engineering of software systems.

  • Today, we're going to be talking about Bentley rules

  • for optimizing work.

  • All right, so work, does anyone know what work means?

  • You're all at MIT, so you should know.

  • So in terms of computer programming,

  • there's actually a formal definition of work.

  • The work of a program on a particular input

  • is defined to be the sum total of all the operations executed

  • by the program.

  • So it's basically a gross measure

  • of how much stuff the program needs to do.

  • And the idea of optimizing work is

  • to reduce the amount of stuff that the program needs

  • to do in order to improve the running time of your program,

  • improve its performance.

  • So algorithm design can produce dramatic reductions

  • in the work of a program.

  • For example, if you want to sort an array of elements,

  • you can use a nlogn time QuickSort.

  • Or you can use an n squared time sort, like insertion sort.

  • So you've probably seen this before

  • in your algorithm courses.

  • And for large enough values of n,

  • a nlogn time sort is going to be much

  • faster than a n squared sort.

  • So today, I'm not going to be talking about algorithm design.

  • You'll see more of this in other courses here at MIT.

  • And we'll also talk a little bit about algorithm design

  • later on in this semester.

  • We will be talking about many other cool tricks for reducing

  • the work of a program.

  • But I do want to point out, that reducing

  • the work of our program doesn't automatically translate

  • to a reduction in running time.

  • And this is because of the complex nature

  • of computer hardware.

  • So there's a lot of things going on that aren't captured

  • by this definition of work.

  • There's instruction level parallelism, caching,

  • vectorization, speculation and branch prediction, and so on.

  • And we'll learn about some of these things

  • throughout this semester.

  • But reducing the work of our program

  • does serve as a good heuristic for reducing

  • the overall running time of a program, at least

  • to a first order.

  • So today, we'll be learning about many ways

  • to reduce the work of your program.

  • So rules we'll be looking at, we call

  • them Bentley optimization rules, in honor of John Lewis Bentley.

  • So John Lewis Bentley wrote a nice little book back

  • in 1982 called Writing Efficient Programs.

  • And inside this book there are various techniques

  • for reducing the work of a computer program.

  • So if you haven't seen this book before, it's very good.

  • So I highly encourage you to read it.

  • Many of the original rules in Bentley's book

  • had to deal with the vagaries of computer architecture

  • three and a half decades ago.

  • So today, we've created a new set

  • of Bentley rules just dealing with the work of a program.

  • We'll be talking about architecture-specific

  • optimizations later on in the semester.

  • But today, we won't be talking about this.

  • One cool fact is that John Lewis Bentley is actually

  • my academic great grandfather.

  • So John Bentley was one of Charles Leiseron's

  • academic advisors.

  • Charles Leiserson was Guy Blelloch's academic advisor.

  • And Guy Blelloch, who's a professor at Carnegie Mellon,

  • was my advisor when I was a graduate student at CMU.

  • So it's a nice little fact.

  • And I had the honor of meeting John Bentley a couple of years

  • ago at a conference.

  • And he told me that he was my academic great grandfather.

  • [LAUGHING]

  • Yeah, and Charles is my academic grandfather.

  • And all of Charles's students are my academic aunts

  • and uncles--

  • [LAUGHING]

  • --including your T.A. Helen.

  • OK, so here's a list of all the work optimizations

  • that we'll be looking at today.

  • So they're grouped into four categories, data structures,

  • loops, and functions.

  • So there's a list of 22 rules on this slide today.

  • In fact, we'll actually be able to look at all of them today.

  • So today's lecture is going to be structured

  • as a series of many lectures.

  • And I'm going to be spending one to three slides on each one

  • of these optimizations.

  • All right, so let's start with optimizations

  • for data structures.

  • So first optimization is packing and encoding

  • your data structure.

  • And the idea of packing is to store more than one data value

  • in a machine word.

  • And the related idea of encoding is

  • to convert data values into a representation that

  • requires fewer bits.

  • So does anyone know why this could possibly reduce

  • the running time of a program?

  • Yes?

  • AUDIENCE: Need less memory fetches.

  • JULIAN SHUN: Right, so good answer.

  • The answer was, it might need less memory fetches.

  • And it turns out that that's correct,

  • because computer program spends a lot of time moving

  • stuff around in memory.

  • And if you reduce the number of things

  • that you have to move around in memory,

  • then that's a good heuristic for reducing the running

  • time of your program.

  • So let's look at an example.

  • Let's say we wanted to encode dates.

  • So let's say we wanted to code this string, September 11,

  • 2018.

  • You can store this using 18 bytes.

  • So you can use one byte per character here.

  • And this would require more than two double words,

  • because each double word is eight bytes or 64 bits.

  • And you have 18 bytes.

  • You need more than two double words.

  • And you have to move around these words

  • every time you want to manipulate the date.

  • But turns out that you can actually

  • do better than using 18 bytes.

  • So let's assume that we only want to store years

  • between 4096 BCE and 4096 CE.

  • So there are about 365.25 times 8,192 dates

  • in this range, which is three million approximately.

  • And you can use log base two of three million bits

  • to represent all the dates within this range.

  • So the notation lg here means log base of two.

  • That's going to be the notation I'll be using in this class.

  • And L-O-G will mean log base 10.

  • So we take the ceiling of log base two or three million,

  • and that gives us 22 bits.

  • So a good way to remember how to compute the log

  • base two of something, you can remember that the log base

  • two of one million is 20, log base two of 1,000 is 10.

  • And then you can factor this out and then add in log base

  • two of three, rounded up, which is two.

  • So that gives you 22 bits.

  • And that easily fits within one 32-bit words.

  • Now, you only need one word instead of three words,

  • as you did in the original representation.

  • But with this modified representation,

  • now determining the month of a particular date

  • will take more work, because now you're not explicitly

  • storing the month in your representation.

  • Whereas, with the string representation,

  • you are explicitly storing it at the beginning of the string.

  • So this does take more work, but it requires less space.

  • So any questions so far?

  • OK, so it turns out that there's another way

  • to store this, which also makes it easy

  • for you to fetch the month, the year, or the day

  • for a particular date.

  • So here, we're going to use the bit fields facilities in C.

  • So we're going to create a struct called date underscore t

  • with three fields, the year, the month, and the date.

  • And the integer after the semicolon

  • specifies how many bits I want to assign

  • to this particular field in the struct.

  • So this says, I need 13 bits for the year,

  • four bits for the month, and five bits for the day.

  • So the 13 bits for the year is, because I

  • have 8,192 possible years.

  • So I need 13 bits to store that.

  • For the month, I have 12 possible months.

  • So I need log base two of 12 rounded up, which is four.

  • And then finally, for the day, I need

  • log base two of 31 rounded up, which is five.

  • So in total, this still takes 22 bits.

  • But now the individual fields can now

  • be accessed much more quickly, than if we had just

  • encoded the three million dates using sequential integers,

  • because now you can just extract a month just by saying whatever

  • you named your struct.

  • You can just say that struct dot month.

  • And that give you the month.

  • Yes?

  • AUDIENCE: Does C actually store it like that,

  • because I know C++ it makes it finalize.

  • So then you end up taking more space.

  • JULIAN SHUN: Yeah, so this will actually

  • pad the struct a little bit at the end, yeah.

  • So you actually do require a little bit more than 22 bits.

  • That's a good question.

  • But this representation is much more easy to access,

  • than if you just had encoded the integers

  • as sequential integers.

  • Another point is that sometimes unpacking and decoding

  • are the optimization, because sometimes it

  • takes a lot of work to encode the values and to extract them.

  • So sometimes you want to actually unpack the values

  • so that they take more space, but they're faster to access.

  • So it depends on your particular application.

  • You might want to do one thing or the other.

  • And the way to figure this out is just to experiment with it.

  • OK, so any other questions?

  • All right, so the second optimization

  • is data structure augmentation.

  • And the idea here is to add information to a data structure

  • to make common operations do less work,

  • so that they're faster.

  • And let's look at an example.

  • Let's say we had two singly linked list

  • and we wanted to append them together.

  • And let's say we only stored the head pointer to the list,

  • and then each element in the list

  • has a pointer to the next element in the list.

  • Now, if you want to spend one list to another list,

  • well, that's going to require you walking down the first list

  • to find the last element, so that you

  • can change the pointer of the last element

  • to point to the beginning of the next list.

  • And this might be very slow if the first list is very long.

  • So does anyone see a way to augment this data structure so

  • that appending two lists can be done much more efficiently?

  • Yes?

  • AUDIENCE: Store a pointer to the last value.

  • JULIAN SHUN: Yeah, so the answer is

  • to store a pointer to the last value.

  • And we call that the tail pointer.

  • So now we have two pointers, both the head and the tail.

  • The head points to the beginning of the list.

  • The tail points to the end of the list.

  • And now you can just append two lists in constant time,

  • because you can access the last element in the list

  • by following the tail pointer.

  • And then now you just change the successor pointer

  • of the last element to point to the head of the second list.

  • And then now you also have to update the tail

  • to point to the end of the second list.

  • OK, so that's the idea of data structure augmentation.

  • We added a little bit of extra information

  • to the data structure, such that now appending

  • two lists is much more efficient than in the original method,

  • where we only had a head pointer.

  • Questions?

  • OK, so the next optimization is precomputation.

  • The idea of precomputation is to perform some calculations

  • in advance so as to avoid doing these computations