字幕表 動画を再生する
Okay, so endianness it's, um, it's a simple problem really
Computers these days arrange their memory in terms of bytes. So 8-bit values, eight binary digits, 8 bits
arranged in the computer's memory and we have an individual address for those things, for each of those bytes.
Now, the computers don't deal always with bytes. Sometimes they use 16-bit values
sometimes they use 32-bit values and so you have an interesting question is that you've got say
a 32-bit value, let's just stick with 32-bit values for now
and you need to assign it into a memory location.
So you've got 8 bits per memory location per memory address and you've got 32 bits so you're gonna have to split it into 4
bytes worth of things, four individual pieces, and then assign each of those individual pieces
into one memory location. Let's pick a 32-bit value and we'll do it in hexadecimal. Just cause it makes the numbers easier.
So the 0x means it's hexadecimal and we're gonna go for 0 0 C 0
F F E E. So this is going to be
our 32-bit value that we're going to want to assign into
four different memory locations.
So this would be address 0 1 2 3 and then 4 it would go on like that, memory locations.
So each of those addresses is going to represent a byte.
That's the number between 0 and 255 which is equivalent to two hexadecimal digits. Each hexadecimal digit represents one nibble four bits
So two of them is a bytes worth. Eight of them is 32 bits worth. So we need to assign
these bytes into
the memory locations. So how do we do it? What would your suggestion be, Shawn?
Shawn: "To me, it looks like you were just kind of translate that down and have
the 0 0 in 0 and just carry on like that."
So you want me to put the 0 0 there and then I put C 0 in there?
I put FF in there and then E E in there?
Shawn: "Yeah, but I do feel like I'm walking into a trap."
No, obviously you like to eat your hard-boiled eggs from the big end
Shawn: "Right."
Ok.
There is another way you could it though. You could start
from the little end and there is a reason why I'm talking about a hard-boiled egg. I haven't completely flipped in this computerphile video.
We'll come back to that in a minute.
Let's draw out another set of four memory locations.
0 1 2 3 & 4. We could also have started from this end
and put the EE in there, the F F in there
the C 0 in there, and then the two zeros in there and that would be another way of doing it.
In actual fact, as long as you're consistent in the way you do it
and you build a computer knowing that if it's going to read a 32-bit value
they're going to be in this order or that order or whatever order and it's consistent then your computer system would work.
What we've done here is we've got two different ways of writing these things out
and this is basically the issue around endianness is: How does your computer store values that are bigger than 1 byte
in memory when each memory is made up of 8-bit locations where we can store 8 bits?
So how do we map say a 32-bit value, a 64-bit value, a 16-bit value
into those 8-bit locations. And this is where we come back to our friend the egg.
There's a book published in the 1700s by Jonathan Swift called Gulliver's Travels.
It's a novel, it's a satire of society. In this novel, Gulliver goes on his travels.
The first place he goes to is a town called Lilliput.
Lilliput, everyone's very tiny, but they like to argue about things and apparently - I haven't read the book -
but apparently at one point civil war breaks out
over which way do you eat an egg?
Do you start from the top, the little end, because it's pointy or do you start from the bottom, the big end?
Half of Lilliput was little-endian.
They would start from the the pointy end and the other half were big-endian. They would start from the other end.
So they would sort of smack it down like that
and start peeling their eggs or hitting it with, uh, probably with a teaspoon and serving it
and dipping their yolk in there.
And we've got here the two main types that are used.
This one is called big-endian and this one is called little-endian.
And the reason why it's called that is because if we were to write this out as a binary number
If you've got a hexadecimal number, you can convert each of the hexadecimal digits into four binary digits
it's relatively easy to write it out.
So we get 1 1 1 0 for the first E, followed by 1 1 1 0
going backwards for the second E.
Then we get 1 1 ... 0 0 and this should be 32 bits there.
Now each of these bits has a number associated with it.
So this would be considered bit 0 and this would be considered bit 31.
And then we can count down, so this is then bit 24. That's bit 23.
Bit 16 and 15.
And then that would be the bit 8 and that's bit 7.
And so this byte, the E E, is what we call the least significant byte
because it's got the bits with the least numbers on them, the smaller bits.
And this is the most significant byte because it's got the bits with the higher numbers on: 24-to-31 as opposed to 0-to-7.
Someone had the big idea that the way to name these things was to reference the egg Wars of Gulliver's Travels
and to refer to
systems that started, the sensible way in my opinion, putting the 0 0 then C 0 then F F then EE
like that in memory, they would be big-endian systems.
People that started by putting E E at the bottom
and then F F C 0 0 0 would be called little-endian systems.
So that's why we call it endianness. It all traces back to eggs of Lilliput in Gulliver's Travels.
Now you might ask why have two systems at all, why not just standardize on doing it one way or the other?
Well, as I said, it doesn't make any difference as long as your computer system's consistent
the people who are writing the software know how it's done, the hardware designers know how it's done
everything's lined up in the right place and it isn't a problem.
But there are some advantages to doing it one way over the other.
So, for example with the big-endian system
it's what you naturally went for, you naturally went for a big-endian system.
And so the people who designed some of the IBM mainframes, the PowerPC architecture
the 68000 chip, and things like the original Macintosh and the Atari ST.
There all big-endian systems. So when they got a 32-bit value they start in the first address
they put the most significant byte and then they go down towards the least significant byte.
On the other hand, the 6502 chip, the ARM chip by default, it can work the other way
the Intel x86 and the AMD x86 chips, there all little-endian systems, Z80 was as well.
They will put the least significant byte first in memory, and there is an advantage from that
because when you're reading it and building the hardware
it doesn't matter whether you've got a
16-bit value or a 32-bit value. If we had a 16-bit value
let us have A B C D, that would be big-endian. And you could also write that
as C D A B, and then that would be little.
If it's a little-endian system, the first byte always goes in bits naught-to-7 the second byte always goes into bits 8-to-15
regardless of whether it's a 2-byte number, a 16-bit number, or a 32-bit number, or a 64-bit number.
So your hardware's simpler to design. On the other hand, if you're reading the memory in a debugger or something
it becomes harder and you have to manually rearrange the bytes in your own head.
There's also another system, which is sometimes referred to as PDP 11 ordering, or mix ordering
which is when you just sort of really mix it up and start from the middle and go out.
You can get really weird ordering, but we'll ignore that for now.
So generally on one system if it's not talking to anything else
it doesn't matter which endianness you need as long as you know what it is.
The problem comes is when you have one computer
communicating with another whether that's over a network
or whether that's by putting data onto a floppy disk, a USB stick, or something.
You've then got bytes laid out in something by one machine
which is being read by another machine and when you do that
you need to make sure that both machines agree on how the bytes are laid out.
So for example...
Networks, when they're transferring data across, they're going to need to agree what order do the bits come in?
What order do the bytes come in to represent a 32-bit number?
If they don't agree on a standard and the Internet, for example, is agreed on everything being big-endian, sensible choice
then
one machine will send it big-endian
the other machine will read it little-endian and get completely the wrong number out when they do it.
So the only time it really matters is when you're transferring data between machines of different types in which case you have to make sure
that you agree on what standard your using to transfer them.
Shawn: "Where's that translation happen?"
So that's a good question. Normally it will happen in the software. Say, for example
when you write software to communicate over a network using IP
there's various functions that you will call to take the number, say, for example, your TCP port number
so like, if you're trying to connect to a web server that's port 80 or port 443 if you've got encryption.
Rather than just setting the value directly in memory, you run it through a function
which is called network to host ordering
or host to network ordering depending on which way you're doing it.
So if you're setting the port number you'd use this one, if you're reading it from a network packet you'd use that one
and that will do the conversion for you, if needed.
So that thing will be defined on, say, an Intel system to convert from little-endian to big-endian.
But on a Motorola system using a 68000, which is natively big-endian, it will just do nothing and copy the values.
Shawn: "Does it slow things down?"
Um, yes, a bit.
So, for example, you have to
read the bytes individually and then shuffle them around in memory
in actual fact modern CPUs, modern ARM chips, modern Intel chips have
instructions that can move big-endian numbers even though they're natively little-endian. And at that point it's done as fast as possible.
These days, with the clock speeds you're dealing with, the slowdown won't be noticeable because you're not doing it that often.
It's... you set one value in a port number when you create the socket.
The rest of the transmission probably is in ASCII, anyway
so you never need to convert anything so it's not gonna make that much of a difference.
If I write down 0 0 1 0
that represents a 2 in its simplest form. That is what
binary coded decimal is and you just use them in 4-bit nibbles. Now. We all know a nibble is half a byte.
A byte equals eight...