Compiling a C
source file , into an executable program , involves multiple steps . They are as follow :
Table of Contents
Preparation of the source files
The first step in compiling a C
source file , is the preparation of the source file for preprocessing .
The first step in the preparation step , is that the physical source file , characters , are mapped to the source character set , so multibyte encoding , or other encodings , are mapped to the source character set .
Next , trigraphs are replaced by the characters which they represent . Trigraphs are formed of two interrogation marks , followed by a character , and they are used as replacement for certain characters . For example ??(
can be used as a replacement for [
.
Finally , any backslash followed by a new line , is deleted . A backslash followed by a new line , can be used as a way to write a preprocessor directive , such as #define
, on multiple lines .
Preprocessing
The source file , is now formed of sequence of characters , and from whitespace . Some of these sequence of characters , are considered to be preprocessor tokens , others are comments , thirds are not related to preprocessing .
What happens next , is that each comment , is replaced by a single white space .
After that , the preprocessor tokens are interpreted . Directives such as #ifdef
are executed , macros such as #define x 1
are expanded . And finally , the #include
directives are performed , causing referenced headers , or source files , to be first prepared for preprocessing as in the first step, and later on preprocessed as in the second step.
Once preprocessing is done , preprocessing artifact are deleted .
The preprocessing step , can be performed alone , by issuing the command :
$ gcc -E source.c > name_of_preprocessed_file.i # If using the gcc compiler . $ cc -E source.c > name_of_preprocessed_file.i # If using the cc compiler . $ cpp -E source.c > name_of_preprocessed_file.i # If using the c preprocessor .
As an example , this is a C
source file :
/* This is a comment */ #define x 0 int y = 1,/* Comments are replaced by a single space*/y; int z = x
And this is the output , of preprocessing this file :
$ gcc -E source.c int y = 1, y; int z = 0
$ gcc -E source.c
, preprocess the source.c
file , and output its content . Comments are replaced by one space , and preprocessor directives are executed. No C
syntax checking is performed .
Getting ready for the execution environment
The third step , is to get ready for the execution environment . Character constants and string literals , are translated from the source character set , into the execution character set , including any escape sequences such as \n
.
Adjacent string literals, such as "a" "b"
are concatenated into one .
The resulting file from this step , is called a translation unit .
Translating into assembly
The resulting file from the first three steps , called a translation unit , is formed of tokens , and whitespace .
The tokens are syntactically and semantically analyzed , with regards to the C standard . The high level C language , is translated into a low level assembly language .
Each cpu architecture , can have its own assembly language , for example the x64
assembly or arm
assembly .
As such , when compiling , a target architecture environment can be specified .
Compiling to an architecture , different from the one on which the compiler is running , is called cross compiling .
The translation into assembly step , can be performed , by issuing the command :
$ gcc -S source.c -o name_of_preprocessed_file.s # If using the gcc compiler . $ cc -S source.c -o name_of_preprocessed_file.s # If using the cc compiler .
As an example , the following source file :
int main(void){ int x =0; }
is converted to assembly :
$ cc -S source.c # Translate source.c into source.s $ cat source.s # output the content of source.s .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 12 .globl _main .p2align 4, 0x90 _main: ## @main .cfi_startproc ## BB#0: pushq %rbp Lcfi0: .cfi_def_cfa_offset 16 Lcfi1: .cfi_offset %rbp, -16 movq %rsp, %rbp Lcfi2: .cfi_def_cfa_register %rbp xorl %eax, %eax movl $0, -4(%rbp) popq %rbp retq .cfi_endproc .subsections_via_symbols
Assembling
In this step, the generated assembly language , is mapped to machine language . Machine language is only formed of 0
and 1
, as such the source file is now translated to 0
and 1
.
The file resulting from this step , is known as object code . Object code , is not yet executable .
The assembling step , can be performed by issuing the following commands :
$ as -c source.s -o source.o # If using as , assemble an # assembly file into an # object file . $ gcc -c source.c -o source.o # If using gcc , translate # a source.c file into # object code . $ cc -c source.c -o source.o # If using cc , translate a # source.c file into # object code .
Linking
In this step , an executable file , is created from object code files. Multiple object code files are combined , parts of static libraries are merged , and external references are resolved . Each operating system , has its own executable object code format .
Linking can be performed by using the ld
command , or by providing options for gcc
, or cc
. For example , the following source file :
/*source.c file */ #include<math.h> int main(void){ double number = sqrt(2.9); }
can be converted to object code using :
$ gcc -c source.c
The object code , can be statically linked against the C
math library , and made into an executable file by issuing the command :
$ gcc source.o -lm -o executable_file_name
Final notes
A compiler can perform all these steps , at once . Like for example issuing gcc source.c
or cc source.c
, the source file is translated into an executable file . Multiple source files , can be passed to gcc
, or cc
.