Table of Contents
What is float , double , long double ?
To store numbers in a computer , an algorithm must be used . The C
standard does not specify the algorithm , or the encoding to be used , for storing any kind
of numbers , be it rational such as 1/2
, integer such as 5
or irrational such
as pi
.
It only specify the name of the numerical data types , such as int
, or
float
. Their meaning , for example int
is used to store signed integer types ,
like -1
or 1
, and float
is used to store approximation of real
numbers such as 1.2
or -12.5
. Their minimum range , for example the minimum
range of the int
type is between -32767
and +32767
. The
algorithms to encode numbers , are specified by computer manufacturer .
The real types in C are the float
,
double
, and long double
. The C standard defines the model of real numbers that
must be encoded , this model is called the floating point model , and it has the following format :
Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .
On computers , that uses the IEEE floating point format
algorithm , the float
type maps to the IEEE single precision floating point , and the
double
type maps to the IEEE double precision floating point . The long double
maps either to the IEEE quadruple precision floating point format , or the IEEE 80 bit floating point
format .
The ranges of the C real types , when using the IEEE floating point format is as follow .
Floating point type | Number of bits | Min value | Max value | Closest value to 0 |
---|---|---|---|---|
float | 32 bits | -3.4 E+38 | +3.4 E+38 | ± 1.17549 E-38 |
double | 64 bits | -1.79769 E+308 | +1.79769 E+308 | ±2.22507 E-308 |
long double | 80 bits | -1.18 E+4932 | +1.18 E+4932 | ±3.36 E-4932 |
The float.h
header , contain information related to
floating point implementations , such as the absolute value of the range [min , max] for each
of the floating types , and the closest value to 0
.
#include<stdio.h> #include<float.h> int main( void){ /* print absolute value min,max range , each floating type .*/ printf( "float absolute value of range : %e\n", FLT_MAX); printf( "double absolute value of range : %e\n", DBL_MAX); printf( "long double absolute value of range : %Le\n", LDBL_MAX); /* print closest absolute value to 0 , for each of the floating types .*/ printf( "closest to 0 absolute value , float : %e\n", FLT_MIN); printf( "closest to 0 absolute value , double : %e\n", DBL_MIN); printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);} /* Output : float absolute value of range : 3.402823e+38 double absolute value of range : 1.797693e+308 long double absolute value of range : 1.189731e+4932 closest to 0 absolute value , float : 1.175494e-38 closest to 0 absolute value , double : 2.225074e-308 closest to 0 absolute value , long double : 3.362103e-4932 */
The type in which floating point arithmetic operations are
performed , is defined in the macro FLT_EVAL_METHOD
, defined in the header
float.h
.
If FLT_EVAL_METHOD
value is set to 2
, then arithmetic operations are performed
by promoting the operands to the long double
type . If FLT_EVAL_METHOD
is set to
1
, then arithmetic operations are performed by promoting the operands to
long double
, if any operand is of the long double
type , otherwise operands
are promoted to the double
type , even if both operands are of the float
type .
If FLT_EVAL_METHOD
is set to 0
, then arithmetic operations are done in the type
of the widest operand . If FLT_EVAL_METHOD
is set to -1
, then it is
indeterminable .
#include<stdio.h> #include<float.h> int main( void){ printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);} /* Output : FLT_EVAL_METHOD : 0 */
Floating point literal
A floating point literal in C , can be written in decimal , in one of the following format :
d+.d* d*.d+ d+[.]ed+
where d
is any digit between 0-9
, +
means one or more ,
*
means zero or more , what is between[]
is optional , and e
is
case insensitive , and means an exponent of the number 10 . As an example :
double x; x = 1. ; x = .1 ; x = 1.0; x = 1e1; // 10.0 x = 1.E1; // 10.0
By default the type of a floating point literal in C , is the
double
type , unless suffixed with f
, case insensitive , in this case it will
be of the float
type , or suffixed with l
, case insensitive , in this case it
will be of the long double
type . As an example :
float aFloat = 1.0f ; double aDouble = 1.0 ; long double alongDouble = 1.0L ;
A floating point literal , can also be written in hexadecimal notation ,
0xh+[.]h*Pd+ 0xh*.h+Pd+
Where 0x
is case insensitive , and stands for hexadecimal , h
is an hexadecimal
digit between 0-F
, +
means one or more , what is between []
is
optional , *
means zero or more , and P
is case insensitive , and means
2
to the power p
, and d
is one or more digits between
0-9
. As an example :
double x ; x = 0xfP0; // 15.0 x = 0Xf.P0; // 15.0 x = 0xf.0P0; // 15.0 x = 0X.1P0; // 1/16 = 0.062500 x = 0x.1p1; // (1/16) * 2 = 0.125000
As with decimal floating point constant , hexadecimal floating point constant
has a default type of double
. To provide the hexadecimal floating point
constant , a type of float
, use the suffix f
, case insensitive , and to give
it the type of long double
, use the suffix l
, case insensitive . As an example
:
float aFloat = 0x1P2f;// 4.0f double aDouble = 0x.1p3 ;// 0.5 long double alongDouble = 0X.3p2L ; // 0.75L