Placeholder Image

字幕表 動画を再生する

  • Let's just see how we would add together two floating point numbers if we've got 42 and so in floating point

  • representation that would be 100 100 100 times 2 to the 1 2 3 4 5 so let's add on 6 so

  • 1 1 o is 6 and that's times 2 to the 2 so we now need to add these two numbers together

  • Now before we would just add them together by going that was that that was that that Plus that that's it. We can't do that

  • with

  • Floating point numbers because the bit patterns for these things being that these are going to look like very very different things

  • What we have to do is first of all

  • Line them up to the bits in the same place. So we need to shift this one down so that the bit here which represents

  • 4 is in the same position as the bit that represents 4 here

  • And so the number of spacing we need to shift

  • This right is a difference between the big one and the little one in this case. It's three places. So 1 2 3

  • Spaces so we shift it 3 spaces to the right

  • And so the first that was rather than just adding them together. We have a

  • Step. Now what we've got to expand them out of the bit representations because remember that this would actually be

  • 0 1 which is encoded 0 1

  • 0 1 0

  • And this the 8 bit exponent this is going to be what

  • 127 plus 5 which is

  • 128 plus 4 so that's going to be 1 0 0

  • One zero zero so it's gonna be something like that

  • So we've got that so that's what that's represented by and this one is going to be similar

  • It's going to be represented by zero. We've got one two, three, four, five six seven eight bits

  • The ones already encoded implicitly zero and zeros down there ignore them for now

  • And we're gonna store. This is 1

  • 0 0 0 0 0

  • 1 so the numbers were actually got in memory in our computer are represented like this

  • so the first thing we have to do is

  • Get them to a point we can and we can't just add these two numbers together anymore

  • And we can see that they're simply by looking over here if we had 1 + 1

  • We'd get 1 0 Z answer which mean with the answer to have a 1 here which means something go from positive numbers to any

  • Negative number which is definitely wrong. So we need to unpack this representation into a form that we can add together

  • now one way we could do that is just work out how many bits we would need and

  • Assign the bits into the right place and do that, but we can actually use some sort of tricks

  • We know for example if we're adding two numbers together

  • with a certain number of bits in this case 24 bits

  • The biggest number that we could add two numbers together and get a result

  • Would have a value of two it roughly around two to the 25

  • The other thing we know is one of these numbers going to have a greater

  • Exponent than the other

  • So what we can do is we could say, okay

  • Let's keep that one where it is and shift this one or divide this one by two

  • So that the exponent on it would be the same. So if we shift this one place to the left

  • We'd end it was this a zero point one

  • times two to the three

  • another place to the left

  • It would be zero point and so on times two to the four

  • Until we end up with that one lined up there and that becomes times two to the five and then we have zero

  • Zero point two zero zero

  • 1 there so

  • We did the first step. We need to unpack them from the

  • representations into

  • forms that we can add together and then we need to shift this one so that the

  • Exponents are the same. So we take the smaller one and shift it

  • So the exponents Alanya now we can add those numbers together

  • So we can now add these because locally we can produce a number one bit bigger than this if we add them together

  • One plus one is two for example. So 0 plus 0 is 0

  • 1 plus 1 is 0 carry 1 0 plus 1 plus 1 is 0

  • Carry 1 1 plus 1 is 0 carry 1 0 plus 0 plus 1 is 1 1

  • 2 to the 5 and then we ended up here times 2 to the 5 as already 6 on to 42

  • And I've got 48 as a result. So he's done the maps and I could write that back now, but

  • potentially we could have ended up with a 2 here if we added up 1 and 1 for example would get 2 and

  • So we need to do a final step

  • once we've done the addition which is to normalize this back potentially into the normal form which in this case would be 1

  • point 1 0

  • 0 0

  • 0 times 2 to the 5 so the reason that floating point numbers take much longer to process

  • Is that as well as doing the addition which you can do in exactly the same way?

  • You also have to take the bits unpack them from the representation

  • shift them along

  • So they match up things then do the addition and then potentially shift them back to get it back into the normalized form the standard

  • scientific

  • Representation the other problem you get is even though we can pack all these numbers

  • Into 32 bits the representation

  • When we slide them along we may end up needing

  • More than 32 bits as many as 48

  • To represent things because if we have to slide this one

  • Along to the point here when we're doing the maps that we actually need 48 bits to do the calculation

  • Of course

  • That means you don't have to do on the 32 bit CP you've two additions for

  • That half and then that half and carry the value over from one to the other which again would slow things down

  • In hardware, you can build your representations to take care of this if you've got 64 bit doubles

  • You know that you perhaps don't need more

  • Than certain number of bits to represent you and you can build the hardware to take all this and it ends up being

  • Much faster that must be quite fiddly to do with standard hardware

  • So is that why we end up with this custom hardware this floating-point unit. It's not most much fiddly. I mean most computers

  • Preserve the carry when they add two bradleys together

  • so if you had two 32-bit numbers that produces value greater than

  • 32 bits they preserve that bit and let you add it on so you can use multiple registers to do it

  • But you just have to then do

  • Two operations to add operations one after the other if you know the operations are going to do this

  • You can build your hardware to do that in one go so we could build hardware that would add these together

  • There are lots of things you can spot where you could early out

  • So for example, if the exponent was such that these end up so far apart

  • That you know adding this onto this where there's all zero bits along here assumed

  • Isn't going to make any difference to this you can say, well actually I don't need to do that

  • I'm just ignore it. If you know the number zero you can ignore it and so on

  • So there's this ways you can speed things up when writing the software and I suspect the hardware just some of the things although probably

  • Isn't lead to

  • the interesting thing if you think about the way the mathematics work

  • Unlike integer numbers where multiplying integer numbers is trickier than addition

  • Because you end up having to do lots of shifts and adds into the different things

  • multiplying to floating point numbers is

  • relatively straightforward compared to addition because

  • We just have to multiply

  • the two

  • Mantises adding the extra bit back in if it's there and

  • Then add the exponents together

  • So multiplication actually becomes much simpler to do with floating point numbers and addition

  • Because the addition requires us to unpack everything and push the bits around to get things in the right place

  • now I've got the token so I can load a value in add the valley from register into it and

  • Store it back and hand the token and now I've got the token again

  • I can load something into

  • It into my register add something onto it so it back and pass the take it on and I've got it so I can load

  • The value in add the value from a register story back

Let's just see how we would add together two floating point numbers if we've got 42 and so in floating point

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

A2 初級

浮動小数点数(パート2:Fp足し算) - Computerphile (Floating Point Numbers (Part2: Fp Addition) - Computerphile)

  • 0 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語