字幕表 動画を再生する
When alex did a video on linked list there's a lot of comments on it it sort of thing
I want to use a linked list arrays a factor
What's the point of reading this and so I was sort of thinking about is that well actually "Are arrays
faster than the linked list or are the linked lists faster than an array, how do we go about finding out?"
Which one's faster, so let's have a shoot out arrays versus linked list
First question to start is
I should just remind ourselves what the difference is between a linked list and
An array so let's start with the array so probably the simplest
To understand so an array is just a way of storing
several
values
in memory so we start off with the first value and
for the purposes of this demonstration, I'm going to use a
Structure containing multiple values. We did talk about objects a while ago structures are what you had before
We saw this object which is a collection of values in memories and store to integer values
So this is array element zero and it's got two numbers. We then got array element one
And it's got two numbers
Looks like I'm playing break out got a great element
To and it's got two numbers and so we can refer to any of these
by
reference to the Index starts at 0 3 4 5 6
7 10 so you can have an array of any size we have to know how big it is if you wanted to be when
You create so you allocate the memory for all the things and then?
You can store some values so we can write 42 in this one. We can put 21 in here
I'm going to put 3 in that one and we can store them or we can put different values in here as
Well as we need them so that's how an array works. Which is one big block of memory each thing stored one after the other
linked list works slightly different we start off at the beginning of
the linked list and it points at nothing but then as we add things to the list we
Allocate the memory for it. We're going to store two things so the first thing we'll label these things inside it p
And Q so it's a lot p here
And we've got q but we also have an extra thing that tells us where the next they will be and that's empty, too
So it's all put 42 and 16 so we just allocate space for that one
Structure that one thing that we're going to store and we connect it up
So we point this at the first thing so we often call this something like first
And we see Alex videos to see how that have done
So when we want to store the next thing we allocate explai for another one
And we can store the value, so we've still got 8
21 in it now the difference between this could be anywhere else in memory, and we connect to the next pointer
to point in
To the next thing so we have some known starting point
which points to the first thing and the first thing 22 the second thing and the second thing would point to the third thing and
So on and that would point down and these could be all over
anywhere
In the computer's memory and then at the end you normally point it at some known value
Which is often referred to as null and is often the number zero?
So you can then walk through your list
And when you find at the next point of zero or null you know you've come to the end of it
There are two different ways of storing data the best or the same amount of data now
He's got some differences this one as I says it's fixed inside, and you create it you can store
How many things you know I've actually filled in but as soon as you get more than eleven since final top array?
You can't go past eleven if you can see yeah the numbers all go to
11 on this however if we wanted to put something else thing we can just get rid of this thing
And update this to point
To a new thing so we can make this grow
Quite easily, and we can also be really clever
We can add things in the middle and the beginning very very simply, but we're going to go into that today
So creating them. We're probably only going to do once but which one is actually faster
Well, let's think about what actually to do. Let's define a very very simple problem
We've created the structure here, and it's got two things in it
It's got an integer called p and an integer called Q so saving the linked list and in
The array let's consider an operation where we're just going to go through all the elements in our array
Or all the elements in our linked list and sum the value of p are going to add up all the value
So we're going to add up. What's in array
Element Zeroes P in array of them at once P and so on so we're going to effectively do
Something's going to add them all up. We'll do the same with the linked list as well
So we're going to set what the first thing is to point to p
Now that one. No. There's no time out of its value of p the next one and so on and we'll add them all up
Why are we using missiles they want to the unique thing about this is
That it's going to visit every single element in either the array or every single element in the linked list
so we're going to visit every single element if you only have to visit one element then the very nature of
Random access into array means as we fast
There's no way you can make any difference unless it's the first element if you only one to ever access the first element
Then look at the same speed we fuel directly to random element
Then the array or win hands down you can just think about how it works that will be the case. We're going to consider
We want to access every single element and do something to it and in this case
You're just going to sum up the values, but it could be if things were representing say windows on the screen
We want to move them around we want to redraw them
Whatever it is will then do something to every single wall so we sort of set out the problem?
I'm going to go over an array of values and a linked list of values
I'm going to in sum in the visits all and we do actually add up all the numbers
Stored in the element P. Which is part of the structure so what I've done is I've written
To see poems which are more or less identical
I'm going to create
125,000 elements in my array
or mailing list and in the linked list I've got a structure which contains an integer p and an integer Q and
in the array version
I've got a structure which contains an integer p and interested Q in the array item
The only difference is that the linked list has one extra item which is the next pointer to the next thing?
So very listen see so hopefully we'll go as fast as we possibly can most of the rest of the program
is
identical
so I have my main Routine here which creates a
List or an array of random number generator?
125,000 random numbers which is a slowest part of the program you'll see and then store them
So allocate memory for each element in the linked list and then we set the value
Everything else will just leave with 0 and we do the same with the array just as we did on the paper
We then run the program
100 times and all going to here is time
How long it takes to run a function which adds afforda numbers either in the list or in the array?
I will do that 100 times
And then we can work out the average time it takes to calculate one and we'll print out the values as we go
So we're just using the built-in clock function in C. Which should be accurate accurate enough for what we want to do
so the real meat the body of the program is this function here some list or
Some array
Let's start with the array one because possibly the simplest we said a variable to be zero that's going to be asked some
You have another variable
I which is going to be our index through the array and we go from I equals zero
I first element until we get to the end the number of elements in there
and we just say this sum equals the value that was already in sum plus the value in the ice element in the array a
P element
Remember we said the array had a p and a Q in it we want the value stored in the p space within the structure
in our array, so I'm just going to add get them together the list version very very similar we set the someone to be 0
We say while p does not equal null
So why we haven't reached the end of the list and we set p initially to be the first thing in the linked list?
So pointing out that first thing sum equals sum plus
P inside p. We using P inside p
Confusing variable names, but then I'm asleep programmer, and then we set p ir pointer to point to the next thing
So we follow the link to the next thing in the linked list and we keep doing that until we come to the end
There are two programs as simple as you get in terms of implementing a linked list or on array
so we're going to do is run them and
We should get some values output
But we're not going to run them on the I max we're not going to run them on the raspberry Pi
we're actually going to run them on the atari behind me which means I need a very high Peak high-tech piece of
Equipment, so I need a floppy disk to transfer other programs from the iMac
Go topping them off the iMac, and then we can run them on the AtAri
so go over to the Atari and
Assurance that it's cameras and spot. We can see it. So what I'm gonna do at first is they got to run
the list first also going to generate
125,000
linked list values to let it Run
Right while it loads off the floppy disk
that's what it used to be like back in the 80s and 90s you had to wait one you crave them ran a
While a
long While
And this is just a simple program
So there is good
So it's not
Initializing the link list it's allocating all the memory for the different elements and then putting a random number
In the P value of each one, so we'll let that run
In the words of the old Hobbit computer game time passes or in the words of every apple of this
secret short
How long does this take?
A while lots of oil, I don't know another time this bit
125,000 times however long it takes to allocate a random number and allocate some memory
On the clock speed of what this is actually an 8 MegAhertz cPU
So it's a takes a while the computer loaded off all the code for the program and off the floppy disk into memory
And it's now running it to generate the data set that we need then Gonna sum all the values
So everything is happening now will happen off memory
It's just taking a while to process it
But we are recording it in 4k which is a slight overkill. I hope you're getting into that
extra bit
So the programs going through now and we'll let it run 400 times
We'll get an average, but actually looking at the values of the coming up
I think we can safely assume that the average is going to end up being
166 clock ticks now when I say qwop takes each of the machines. We're going to look at are going to have
Different clocks perms that they use the time things so we can't compare the values directly what we could come read with the seconds
But we can certainly see the relative patterns between it will call that
166 clock ticks over to run the linked list version of the program it takes
166 clock ticks
So let's now run the array version of the program
So now we're going to do exactly the same thing
we're going to allocate an array every time tWenty-five thousand elements and
populate them with random values because of the way
I've written the program
It will actually be the same pSeudo random number sequence that the sum should be the same and then we're going to fill
each of them up and time how long that takes and we'll they'll see whether an array actually is faster than the linked list or
Whether a linked list will be b. Array
So that to me looks like that little spinny thing is going faster
Yeah, well so and I haven't just upgraded the processor while they're off camera
What's actually happening is?
When we allocate the space for the linked list we have to allocate space for each element and then for the gym there
We allocate space for the next element. We could do that in a different way and speed things up, but we did it
That's the classic way with the array. We allocate the whole lot in one go, so we all take one
Huge block of memory under 25,000 times eight is about one meg's worth of memory
So allocate all that in one go, and then we're just going through it filling in all the values setting all the values
So that should be quicker
and it seems and spinning wheel is going slightly faster what why would that Spin wheel very fast can you just learn how that
You can read it if you've heard this thing you it yeah?
So to give some nice sort of visual Feedback while this is running
Well, I'm just a plain white screen what I'm doing is I'm printing every hundred element ideals
I update that thing and just printing either a dash a
Vertical line or one of the two slashes to make it look like it's getting round and those are printing it it creates a little
Effect which is see that something's happened and the computer hasn't crashed
So this should be faster, and we should get a value so is now working and the immediate thing we see is
That the array is taking about
179 or
178 ticks it's the same quad kick so we can compare these two values
the Array is
slower than a linked list I
Know down, okay? All right now. So you can't argue in - okay, so the numbers have come in the computers around
170 angular call is to that 170 8.5. Always for the final average to come in but for that. So they're roughly comparable
Basically there's not much it sort of thing, but there's something about 200 qwop ticks per second on this machine
It's is probably a noticeable. It's about 10 percent slower
Okay, so that's it. I mean we can stop here come a that's it. We know. What's what arrays are slow
You wrinkly yep, so arrays are slower than ling Li
Gotta get my vendor video yeah and a video you can now bring up the slash computer, so
virtually
There are two ways you can walk over the Ray
We went up the array from the smaller numbers to the larger numbers
But we could also start at the larger indexes and move back to the smaller indexes, and I thought well
Let's try it both ways let's not have anyone posting some comments saying no, but if you went the other way
So let's run it the other way, so I wrote a version which walks down and because we're adding the numbers together
We'll get exactly the same answer a plus b equals b. Plus a and all that you can watch numberphile to find out
More online. I'm sure
You think this would take exactly the same amount of time, so just update our table
To do this and we're going to have the reverse array here
I'll just wait while it does that
So going backwards through the array the only thing can we do the same thing with the linked list
Well the way we've designed the linked list we
Have a pointer to the next thing in each thing, but we don't have a pointer back to the previous thing
It's a singly linked list so we can only move forwards along it
We could design it to be a doubly linked list and have a backwards and forwards pointer
but if you think about it if we started at the end
And move backwards that's going to be exactly the same
Operations as if we start at the beginning and before so there's actually no point in testing
It'd be exactly the same code. We just be using different offsets into memory so taking exactly the same amount of time
While we wait for that to do its thing let's run the same programs on
The Imac and see how much faster it is so let's compile them up. I am turning on as I did before all the optimizations
For the array and things that'll make it go as fast as poss but some of the comparison to the array
Test and I going to compile with the speed test
Done, and I'm going to do the same thing
for the link
Test I'm going to say something really obvious now
But it's honestly a heck of a lot quicker this machine just to get compiled up and ready. Yeah
Those machines much faster because while I'm compiling everything on here and then transferring it across
Rather than trying to get my old C compiler going on the AtAri
And also the chances are this will produce better code, so it'll take make the most benefit of the speed
We can pile them up, so let's run the linked list test as we did before
boom
It's done
Everything now you notice that took a lot quicker?
The numbers are still roughly in the same order still about 100 and around 200 but remember this is a different
Clock we cannot compare the ticks from that one - it's on the different numbers, but we get the average here two hundred and nine
Point five four I'm acting this is 166 atari ticks
They're much slower chicks than the I like ticks we can't compare that to that we can compare
horizontally
let's do the same now with the array test and
Wow
When we did the linked list test on the atari it was faster than the array test
Roughly take you about ten percent faster
Nothing really in it you could say look at the difference on
The Imac the IMac takes forty three point four four clock ticks to do the array
209 that's five times
Faster for the Array was on the Atari the linked list was faster
So a reverse array is now going on
the Atari
114 and
That's quicker. That's quicker than the original array, and that's quicker than the linked list, so if we do everything backwards
Doogie where it's very good if you do everything forwards
It's much slower. I'm a bit confused
Is there any possibility that's just a problem the code or something always are you going to reveal something to this year?
We need to delve a bit deeper here to see what's really going on
Remember we wrote these programs in C
So we wrote these in the high-level language and then they got compiled down
To the instructions that the machine can execute and what's actually happening here. Is that the research shows that the machine can execute?
Favor walking backwards down something compared with going further upwards now
Let's run that same array backwards
Program on the IMac for completing this so let me compile that one up and again. I'm
Optimizing everything to the hilt
Test and so we'll run speed tests back
And we'll run this one so before the arrays now much faster
According to this a reversal rate should be faster than the forward array. So what would you expect the value to be shown for?
Run on the reverse away about fifteen or twenty directly if it was the same as that did
All right
Would you like to stick with that answer or would you like to change that oh?
nervous excited actually
Because the Atari was faster to do a linked list in an array and the iMac was the other way around
I'm going to say it's going to be slower to do it in reverse. So you reckon firm supposed to do it in reverse
Oh, I do like an indecisive person there. We go average time forty seven
point four six
So marginally slow marginally slower about 10% slower
but still an order of magnitude
faster than this we could do the same on the raspberry Pi
So again, we'll compile all of these up, so we'll do it array test
Compile it up noticeably slower to compile will do the LL test. This is the list version
And we'll do the reverse test
So compile them all up on the last repair, so let's run these and get the numbers we've run the array test
we now get nine 165 point seven five as an average for the array will run the link list and
We get one eight five eight
Point six one for the linked list test and now we run the reverse test
and
We get 101
9.5 five so we've run the test now. So we've we've assumed nothing
We've ran some tests or to get some data so we can see which is faster an array or liquids with this operation
We've been trying
And we've got some pretty interesting results
So we're running on the atari the linked list was faster than the array
unless he ran the array backwards in which case the array was faster than the linked list and
The array going forward we'll come back to that
We ran it on the raspberry Pi and here the array was about twice as fast as the linked list
Baby rarely backwards the Array was slower, and if you do it on the IMac view range height, I'm faster than the linked list
Whichever way you went so what's going on here?
Well, let's ignore the iMac for the minute the apple Haters will love that bit
but let's ignore the Imac for though and let's just have a look at the raspBerry Pi and
the attali so we've got
the Atari and the Raspberry Pi and we'll just go with
The array speed and the link to this so we've got about one hundred and seventy nine
For that and one hundred and sixty-six clock ticks that now when we can't compare the clock ticks between the different machines because they're different
different clocks used in the machines
but we can compare them relatively between the same thing on the same machine for the raspberry Pi it was
966 and
that was a
1859 Kish now what's going on here?
Well the thing we need to remember
if you look at the machine code
It's roughly the same length the same number of operation now on the atari some of the instructions will take slightly longer than to execute
But that's not what gets going on here
We need to think back to the video on
Cashiers that we did the difference between the raspberry Pi CPU would have been much faster and more modern and the Ataris
Is that the AtAris?
Doesn't have a cache
So every instructionally needs to fetch every bit of data that needs to fetch has to be fetched from memory each time
No, it's not cached
broadly speaking we believe something going on there, but broadly speaking with no cache is about half the
Prefetch buffer if you want to get into the details, but we can see there's no cash
So if we think about the cPU in the atari, then it's having to access
memory
For everything so everything that the cPU needs to fetch on the Atari the instructions
The data from the array or a linked list and of course the next pointer from the linked list has to come from memory
So it takes the same amount of time we get are the two weather fetching data or fetching instructions on the raspBerry Pi however?
We still got the main body the cPU which is going to execute things and it has memory
as well
but in between there we have a cache in fact we actually probably have two caches one for Data and
one for instructions as
I want to see if you activate as we looked at in the previous video is
Access to it via the caches, and then they if they haven't got it to get it
From memory, so why does this make a difference?
Surely things will still wake up so well on the atari the fastest
thing in the theory is basically the memory the memory is much faster than the cPU that's about twice as fast and
So the memory can provide a data exactly when the cPU needs it. There's no real need for cash
Move ahead to the raspberry Pi and the are make of course then the cPU is much faster than the memory
So when cPU ask for something it has to wait while the memory?
provides it
Now let's figure out how what happens when we run our program?
with the array
on the raspberry Pi
Every instruction after the first time will be accessed will be cached in the instruction cache
So the first time we go through the loop all the instructions are going to be used in that loop will have been cached in
The instruction cap so you can get these immediately the data
particularly with the array would also collapse a bit with a linked list we
Don't get just one or two bytes in each time
I'll get what we call a cache line pickups 128 bytes in a go, so we'll get some of the data that we already need
Into the cache as well, so some of the data will already be there in the array more so with your right
so the reason of the array runs faster on the raspberry Pi is that all the instructions are coming straight out of the
Institute basically giving us a fast lane for those instructions, so they get there immediately
There's only the data that needs to get from main memory, which will get cached as well
So the same happens with the route list program
Except for one crucial difference with the linked list poem we have to make one memory access
to get the data value that we're going to sum and
Then we have to make another memory access which has to go through the cat into main memory
for the address of the next thing
we have to make to
Memory request which may get satisfied by main Memory here
Where's our the array one?
We only have to make one for the value that we're interested in and because of the cache
We get sort of a fast passing instructions, and so actually not having to do that second memory access to get the next pointer here
Means that this the arraylist of the program ends are working about twice as fast on the atari because everything is coming from
memory it doesn't matter whether you're reading an instruction or reading a
bit of Data
It's still going to come from memory each time so actually the front of the value is already pre-calculated in memory means that actually runs
Slightly faster, I mean, we're talking perhaps 10% faster. Not really a billion so we get a slight speed benefit
Now to show that this is the case
I took the same version of the program on the atari and ran it on the atari router now the difference for the AtAri
St. We use there and the Falcon is that the falcon has a slightly later version of the 68000 processor
which has a cache
In it both instructions and data cache and when I ran that on there the times that came out
So the array was 46 clock ticks is much faster processor anyway, and the link list
was
58.5
Clock ticks so on the Falcon because you've got the cache. They're just like on the raspberry Pi
The array version ends up being traffic exactly the same program exactly the same machine code
Because you've got the cache there the array version ends up being faster because the instructions of their
Professional and that sort of fat packs from the cache rather having to go to main memory each time
mean that the
irradiation becomes faster than the linked list version
Now tell them other things we haven't talked about yet. Why is the reverse array faster on the AtAri?
Simple answer to that it
Just uses slightly different instructions their instructions on the six 8000 that allow you to do a decrement
Testing with zero branch which isn't all in one instruction
So you can actually make the code slightly more compact and run slightly faster again. It's a small enough time
Why then is the imac significantly faster? We're getting sort of five times faster?
Well that's because I use a slightly different compiler that I use the clang C
Compiler on the IMac rather than GCC for the other two and it cheats it
It squats that you've got an array axises in the loop, and it says well
Okay, rather than doing a loop around one array access
I will do a loop around eight array accesses all in one go
So it actually removes some of the tests for the loop and it makes the program much faster for the compilers being clever
And optimizing the program
So it runs faster the answer to whether the raise faster than the link list or not?
Very much depends on how you cpu in the machine that you've built is configured to actually execute the code
So you can't make assumptions?
About how a cpu will execute code you really need to do the test to see what's happening even within the same
architecture family the difference between the six 8006 8030 in the Falcon mean that
Assumptions we make based on what they charity did where the Rally was slower than the link list aren't true on
The Falcon and when you move it on to a completely different architecture like the raspberry Pi or the X86 chip in the iMac
Then again you get different effects, so the best thing is when you're faced with a question question like this, and you're not sure
Come up with some test and collect real data, and you'll be able to see what's going on
could you