Compiling a C source file , into an executable program , involves multiple steps . They are as follow :
Table of Contents
Preparation of the source files
The first step in compiling a C source file , is the preparation of the source file for preprocessing .
The first step in the preparation step , is that the physical source file , characters , are mapped to the source character set , so multibyte encoding , or other encodings , are mapped to the source character set .
Next , trigraphs are replaced by the characters which they represent . Trigraphs are formed of two interrogation marks , followed by a character , and they are used as replacement for certain characters . For example ??( can be used as a replacement for [ .
Finally , any backslash followed by a new line , is deleted . A backslash followed by a new line , can be used as a way to write a preprocessor directive , such as #define , on multiple lines .
Preprocessing
The source file , is now formed of sequence of characters , and from whitespace . Some of these sequence of characters , are considered to be preprocessor tokens , others are comments , thirds are not related to preprocessing .
What happens next , is that each comment , is replaced by a single white space .
After that , the preprocessor tokens are interpreted . Directives such as #ifdef are executed , macros such as #define x 1 are expanded . And finally , the #include directives are performed , causing referenced headers , or source files , to be first prepared for preprocessing as in the first step, and later on preprocessed as in the second step.
Once preprocessing is done , preprocessing artifact are deleted .
The preprocessing step , can be performed alone , by issuing the command :
$ gcc -E source.c > name_of_preprocessed_file.i # If using the gcc compiler . $ cc -E source.c > name_of_preprocessed_file.i # If using the cc compiler . $ cpp -E source.c > name_of_preprocessed_file.i # If using the c preprocessor .
As an example , this is a C source file :
/* This is a comment */ #define x 0 int y = 1,/* Comments are replaced by a single space*/y; int z = x
And this is the output , of preprocessing this file :
$ gcc -E source.c int y = 1, y; int z = 0
$ gcc -E source.c , preprocess the source.c file , and output its content . Comments are replaced by one space , and preprocessor directives are executed. No C syntax checking is performed .
Getting ready for the execution environment
The third step , is to get ready for the execution environment . Character constants and string literals , are translated from the source character set , into the execution character set , including any escape sequences such as \n.
Adjacent string literals, such as "a" "b" are concatenated into one .
The resulting file from this step , is called a translation unit .
Translating into assembly
The resulting file from the first three steps , called a translation unit , is formed of tokens , and whitespace .
The tokens are syntactically and semantically analyzed , with regards to the C standard . The high level C language , is translated into a low level assembly language .
Each cpu architecture , can have its own assembly language , for example the x64 assembly or arm assembly .
As such , when compiling , a target architecture environment can be specified .
Compiling to an architecture , different from the one on which the compiler is running , is called cross compiling .
The translation into assembly step , can be performed , by issuing the command :
$ gcc -S source.c -o name_of_preprocessed_file.s # If using the gcc compiler . $ cc -S source.c -o name_of_preprocessed_file.s # If using the cc compiler .
As an example , the following source file :
int main(void){
int x =0;
}
is converted to assembly :
$ cc -S source.c # Translate source.c into source.s $ cat source.s # output the content of source.s .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 12 .globl _main .p2align 4, 0x90 _main: ## @main .cfi_startproc ## BB#0: pushq %rbp Lcfi0: .cfi_def_cfa_offset 16 Lcfi1: .cfi_offset %rbp, -16 movq %rsp, %rbp Lcfi2: .cfi_def_cfa_register %rbp xorl %eax, %eax movl $0, -4(%rbp) popq %rbp retq .cfi_endproc .subsections_via_symbols
Assembling
In this step, the generated assembly language , is mapped to machine language . Machine language is only formed of 0 and 1 , as such the source file is now translated to 0 and 1 .
The file resulting from this step , is known as object code . Object code , is not yet executable .
The assembling step , can be performed by issuing the following commands :
$ as -c source.s -o source.o # If using as , assemble an # assembly file into an # object file . $ gcc -c source.c -o source.o # If using gcc , translate # a source.c file into # object code . $ cc -c source.c -o source.o # If using cc , translate a # source.c file into # object code .
Linking
In this step , an executable file , is created from object code files. Multiple object code files are combined , parts of static libraries are merged , and external references are resolved . Each operating system , has its own executable object code format .
Linking can be performed by using the ld command , or by providing options for gcc , or cc . For example , the following source file :
/*source.c file */
#include<math.h>
int main(void){
double number = sqrt(2.9);
}
can be converted to object code using :
$ gcc -c source.c
The object code , can be statically linked against the C math library , and made into an executable file by issuing the command :
$ gcc source.o -lm -o executable_file_name
Final notes
A compiler can perform all these steps , at once . Like for example issuing gcc source.c or cc source.c , the source file is translated into an executable file . Multiple source files , can be passed to gcc , or cc .

