# IEEE 754

IEEE 754 is a standardised way of storing floating point numbers with three components

• A sign bit
• A biased exponent
• A normalised mantissa
TypeSignExponentMantissaBias
Single Precision (32 bit)1 (bit 31)8 (bit 30 - 23)23 (bit 22- 0)127
Double Precision (64 bit)1 (bit 63)11 (bit 62 - 52)52 (51 - 0)1023

The examples below all refer to 32 bit numbers, but the principles apply to 64 bit.

• The exponent is an 8 bit unsigned number in biased form
• To get the true exponent, subtract 127 from the binary value
• The mantissa is a binary fraction, with the first bit representing , second bit , etc.
• The mantissa has an implicit , so 1 must always be added to the mantissa

## Decimal to Float

The number is converted to a binary fractional format, then adjusted to fit into the form we need. Take 12.375 for example:

• Integer part
• Fraction part

Combining the two parts yields . However, the standard requires that the mantissa have an implicit 1, so it must be shifted to the right until the number is normalised (ie has only 1 as an integer part). This yields . As this has been shifted, it is actually . The three is therefore the exponent, but this has to be normalised (+127) to yield 130 . The number is positive (sign bit zero) so this yields:

SignBiased ExponentNormalised Mantissa
01000 0010100011

## Float to Decimal

Starting with the value 0x41C80000 = 01000001110010000000000000000000:

SignBiased ExponentNormalised Mantissa
01000 00111001
• The exponent is 131, biasing (-127) gives 4
• The mantissa is 0.5625, adding 1 (normalising) gives 1.5625
• gives 25

## Special Values

• Zero
• When both exponent and mantissa are zero, the number is zero
• Can have both positive and negative zero
• Infinity
• Exponent is all 1s, mantissa is zero
• Can be either positive or negative
• Denormalised
• If the exponent is all zeros but the mantissa is non-zero, then the value is a denormalised number
• The mantissa does not have an assumed leading one
• NaN (Not a Number)
• Exponent is all 1s, mantissa is non-zero
• Represents error values
ExponentMantissaValue
00
2550
0not 0denormalised
255not 0NaN