What is a float , double , long double and a floating point literal in C ?

 

What is float , double , long double ?

To store numbers in a computer , an algorithm must be used . The C standard does not specify the algorithm , or the encoding to be used , for storing any kind of numbers , be it rational such as 1/2 , integer such as 5 or irrational such as pi .

It only specify the name of the numerical data types , such as int , or float . Their meaning , for example int is used to store signed integer types , like -1 or 1 , and float is used to store approximation of real numbers such as 1.2 or -12.5 . Their minimum range , for example the minimum range of the int type is between -32767 and +32767 . The algorithms to encode numbers , are specified by computer manufacturer .

The real types in C are the float , double , and long double . The C standard defines the model of real numbers that must be encoded , this model is called the floating point model , and it has the following format :

Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .

On computers , that uses the IEEE floating point format algorithm , the float type maps to the IEEE single precision floating point , and the double type maps to the IEEE double precision floating point . The long double maps either to the IEEE quadruple precision floating point format , or the IEEE 80 bit floating point format .

The ranges of the C real types , when using the IEEE floating point format is as follow .

Floating point typeNumber of bitsMin valueMax valueClosest value to 0
float32 bits-3.4 E+38+3.4 E+38± 1.17549 E-38
double64 bits-1.79769 E+308+1.79769 E+308±2.22507 E-308
long double80 bits-1.18 E+4932+1.18 E+4932±3.36 E-4932

The float.h header , contain information related to floating point implementations , such as the absolute value of the range [min , max] for each of the floating types , and the closest value to 0 .

#include<stdio.h>
#include<float.h>

int main( void){
  /*
    print absolute value min,max range , each floating
    type .*/
  printf( "float absolute value of range : %e\n", FLT_MAX);
  printf( "double absolute value of range : %e\n", DBL_MAX);
  printf( "long double absolute value of range : %Le\n", LDBL_MAX);

  /* print closest absolute value to 0 , for each
     of the floating types .*/
  printf( "closest to 0 absolute value , float : %e\n", FLT_MIN);
  printf( "closest to 0 absolute value , double : %e\n", DBL_MIN);
  printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);}

/* Output :
float absolute value of range : 3.402823e+38
double absolute value of range : 1.797693e+308
long double absolute value of range : 1.189731e+4932

closest to 0 absolute value , float : 1.175494e-38
closest to 0 absolute value , double : 2.225074e-308
closest to 0 absolute value , long double : 3.362103e-4932 */

The type in which floating point arithmetic operations are performed , is defined in the macro FLT_EVAL_METHOD , defined in the header float.h .

If FLT_EVAL_METHOD value is set to 2 , then arithmetic operations are performed by promoting the operands to the long double type . If FLT_EVAL_METHOD is set to 1 , then arithmetic operations are performed by promoting the operands to long double , if any operand is of the long double type , otherwise operands are promoted to the double type , even if both operands are of the float type . If FLT_EVAL_METHOD is set to 0 , then arithmetic operations are done in the type of the widest operand . If FLT_EVAL_METHOD is set to -1 , then it is indeterminable .

#include<stdio.h>
#include<float.h>

int main( void){
  printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);}

/* Output : 
FLT_EVAL_METHOD : 0 */

Floating point literal

A floating point literal in C , can be written in decimal , in one of the following format :

d+.d*
d*.d+
d+[.]ed+

where d is any digit between 0-9 , + means one or more , * means zero or more , what is between[] is optional , and e is case insensitive , and means an exponent of the number 10 . As an example :

double x; 
x = 1. ;
x = .1 ;
x = 1.0;
x = 1e1; // 10.0
x = 1.E1; // 10.0

By default the type of a floating point literal in C , is the double type , unless suffixed with f , case insensitive , in this case it will be of the float type , or suffixed with l , case insensitive , in this case it will be of the long double type . As an example :

float aFloat = 1.0f ;
double aDouble = 1.0 ;
long double alongDouble = 1.0L ;

A floating point literal , can also be written in hexadecimal notation ,

0xh+[.]h*Pd+
0xh*.h+Pd+

Where 0x is case insensitive , and stands for hexadecimal , h is an hexadecimal digit between 0-F , + means one or more , what is between [] is optional , * means zero or more , and P is case insensitive , and means 2 to the power p , and d is one or more digits between 0-9 . As an example :

double x ;
x = 0xfP0; // 15.0
x = 0Xf.P0; // 15.0
x = 0xf.0P0; // 15.0
x = 0X.1P0; // 1/16 = 0.062500
x = 0x.1p1; // (1/16) * 2 = 0.125000

As with decimal floating point constant , hexadecimal floating point constant has a default type of double . To provide the hexadecimal floating point constant , a type of float , use the suffix f , case insensitive , and to give it the type of long double , use the suffix l , case insensitive . As an example :

float aFloat = 0x1P2f;// 4.0f
double aDouble = 0x.1p3 ;// 0.5
long double alongDouble = 0X.3p2L ; // 0.75L