Table of Contents

## What is float , double , long double ?

To store numbers in a computer , an algorithm must be used . The __ C standard does not specify__ the algorithm , or the encoding to be used , for storing any kind of numbers , be it rational such as

`1/2`

, integer such as `5`

or irrational such as `pi`

.__ It only specify__ the name of the numerical data types , such as

`int`

, or `float`

. Their meaning , for example `int`

is used to store signed integer types , like `-1`

or `1`

, and `float`

is used to store approximation of real numbers such as `1.2`

or `-12.5`

. Their minimum range , for example the minimum range of the `int`

type is between `-32767`

and ` +32767 `

. The algorithms to encode numbers , are specified by computer manufacturer .The __ real types in C__ are the

`float`

, `double`

, and `long double`

. The C standard defines the model of real numbers that must be encoded , this model is called the floating point model , and it has the following format :Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .

On computers , __ that uses the IEEE__ floating point format algorithm , the

`float`

type maps to the IEEE single precision floating point , and the `double`

type maps to the IEEE double precision floating point . The `long double`

maps either to the IEEE quadruple precision floating point format , or the IEEE 80 bit floating point format .The __ ranges of__ the C real types , when using the IEEE floating point format is as follow .

Floating point type | Number of bits | Min value | Max value | Closest value to 0 |
---|---|---|---|---|

float | 32 bits | -3.4 E+38 | +3.4 E+38 | ± 1.17549 E-38 |

double | 64 bits | -1.79769 E+308 | +1.79769 E+308 | ±2.22507 E-308 |

long double | 80 bits | -1.18 E+4932 | +1.18 E+4932 | ±3.36 E-4932 |

The `float.h`

header , __ contain information related to floating point implementations__ , such as the absolute value of the range [min , max] for each of the floating types , and the closest value to

`0`

.#include<stdio.h> #include<float.h> int main( void){ /* print absolute value min,max range , each floating type .*/ printf( "float absolute value of range : %e\n", FLT_MAX); printf( "double absolute value of range : %e\n", DBL_MAX); printf( "long double absolute value of range : %Le\n", LDBL_MAX); /* print closest absolute value to 0 , for each of the floating types .*/ printf( "closest to 0 absolute value , float : %e\n", FLT_MIN); printf( "closest to 0 absolute value , double : %e\n", DBL_MIN); printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);} /* Output : float absolute value of range : 3.402823e+38 double absolute value of range : 1.797693e+308 long double absolute value of range : 1.189731e+4932 closest to 0 absolute value , float : 1.175494e-38 closest to 0 absolute value , double : 2.225074e-308 closest to 0 absolute value , long double : 3.362103e-4932 */

The __ type in which floating point arithmetic__ operations are performed , is defined in the macro

`FLT_EVAL_METHOD`

, defined in the header `float.h`

.If `FLT_EVAL_METHOD`

value is set to `2`

, then arithmetic operations are performed by promoting the operands to the `long double`

type . If `FLT_EVAL_METHOD`

is set to `1`

, then arithmetic operations are performed by promoting the operands to `long double `

, if any operand is of the `long double`

type , otherwise operands are promoted to the `double`

type , even if both operands are of the `float`

type . If `FLT_EVAL_METHOD`

is set to `0`

, then arithmetic operations are done in the type of the widest operand . If `FLT_EVAL_METHOD`

is set to `-1`

, then it is indeterminable .

#include<stdio.h> #include<float.h> int main( void){ printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);} /* Output : FLT_EVAL_METHOD : 0 */

## Floating point literal

A floating point literal in C , __ can be written in decimal__ , in one of the following format :

d+.d* d*.d+ d+[.]ed+

where `d`

is any digit between `0-9`

, `+`

means one or more , `*`

means zero or more , what is between`[]`

is optional , and `e`

is case insensitive , and means an exponent of the number 10 . As an example :

double x; x = 1. ; x = .1 ; x = 1.0; x = 1e1; // 10.0 x = 1.E1; // 10.0

By default the __ type of a floating point literal__ in C , is the

`double`

type , unless suffixed with `f`

, case insensitive , in this case it will be of the `float`

type , or suffixed with `l`

, case insensitive , in this case it will be of the `long double`

type . As an example :float aFloat = 1.0f ; double aDouble = 1.0 ; long double alongDouble = 1.0L ;

A floating point literal , can also be written __ in hexadecimal notation__ ,

0xh+[.]h*Pd+ 0xh*.h+Pd+

Where `0x`

is case insensitive , and stands for hexadecimal , `h`

is an hexadecimal digit between `0-F`

, `+`

means one or more , what is between `[]`

is optional , `*`

means zero or more , and `P`

is case insensitive , and means `2`

to the power `p`

, and `d`

is one or more digits between `0-9`

. As an example :

double x ; x = 0xfP0; // 15.0 x = 0Xf.P0; // 15.0 x = 0xf.0P0; // 15.0 x = 0X.1P0; // 1/16 = 0.062500 x = 0x.1p1; // (1/16) * 2 = 0.125000

As with decimal floating point constant , hexadecimal floating point constant *has a default type of*`double`

. To provide the hexadecimal floating point constant , a type of `float`

, use the suffix `f`

, case insensitive , and to give it the type of `long double`

, use the suffix `l`

, case insensitive . As an example :

float aFloat = 0x1P2f;// 4.0f double aDouble = 0x.1p3 ;// 0.5 long double alongDouble = 0X.3p2L ; // 0.75L