Arm assembly language tutorial

An arm processor, can switch and operate on multiple instruction sets. Some of the instruction sets, that an arm processor can operate on, are thumb, which is a 16 bit instruction set, thumb2 which combines 16 and 32 bit instructions, aarch32 also known as arm32 or arm , and is a 32 bit instruction set, and finally arm64, also known as aarch64, which is also a 32 bit instruction set, but for 64 bit arm processors.

This tutorial will discuss, the arm32 instruction set.

Arm instruction format

Example

Some examples of arm instructions are:

add r0, r1, r2 
# Performs r1 + r2, and stores the
# result in r0 . 

adds r0, r1, r2
# Performs the addition of r1, and 
# r2, and stores the result in r0. 
# Additionally if r1 + r2 overflows,
# or if the result of the addition
# is zero, or if some other 
# conditions occur, a flag bit, 
# which corresponds to the condition,
# which has occurred is set.


addEQ r0,r1,r2 
# If the zero condition flag is set, 
# r1 is added to r2, and the result
# is stored in r0.


addsEQ r0,r1,r2 
# If the zero condition flag is set, 
# r1 is added to r2, and the
# result is stored in r0. 
# Additionally a flag is set,
# for example if a carry or 
# overflow, or other conditions
# occur. 

The format

This being said, arm instructions, have the following format:

Mnemonic{s}{Condition} Rd, Rn, <Operand2>

Mnemonic is the name of an instruction, for example the add instruction.

curly brackets means that what is between the curly brackets is optional, for example, the add instruction, can optionally be suffixed, with either an s, or a condition as in EQ, or both, as shown in the previous section.

This being said, when a mnemonic, which is the name of an instruction, is suffixed by an s, condition flags are set, and when a mnemonic is suffixed by a condition, as in EQ, the condition flags are checked.

Rd is the destination register. Rn is any register, and <Operand2>, can be one of: any register Rm, an immediate value as in #2, or any register Rm shifted by an immediate value like #2or by another register.

add R0, R1, R2
# Add the value stored in the
# register R1, to the value 
# stored in the register R2, 
# and store the result in R0.
# R0 = R1 + R2


add R0, R1, #1
# Add the value stored in
# the register R1, to the
# immediate value #1, and store
# the result in the register
# R0. 
# R0 = R1 + 1 


add R0, R1, R2, LSL #1
# Perform a logical shift left,
# on the register R2, by 1, which
# is the immediate value #1.
# The gotten result, is added
# to R1, and the resultant value
# is stored in R0.
# R0 = R1 + R2<<1


add R0, R1, R2, LSR R3
# Perform a logical shift 
# right, on the content of 
# the register R2, by the 
# value stored in register
# R3. 
# Next, the gotten shifted
# result, is added to the
# content of register R1,
# and the resultant value
# is stored in R0.
# R0 = R1 + R2>>R3

Concerning the immediate values, they can be for example written in decimal as in #1 , or hexadecimal as in #0xFFFFFFFF, or binary as in 0b1.

An arm32 instruction, is 32 bits. 12 of these 32 bits, can be reserved to express an immediate value.

These 12 bits, are also partitioned into 4 bits, which represent a right rotation, and 8 bits which represent a value, which range is between 0 and FF.

The assembler is in charge, of converting the immediate values that you provide, into these 12 bits, by rotating to the right the lower 8 bit value, using the rotation bits multiplied by 2 , which should give a result equal to the immediate value, otherwise an error is generated.

mov    r0, #0x83FFFF1F
# This will cause an error, which
# states something like invalid immediate 
# operand value, and that an immediate value,
# must be creatable, by rotating an 8 bit 
# number right, using the rotation bits.

mov    r0, #0x80000000
# 0x80000000, can be obtained, for example
# by rotating the binary value 10, to the 
# right by 2, as such the assembler will 
# not complain, and the value 
# 0x80000000, is stored into the register
# r0. 

The condition flags

So what are these condition flags? Basically they are N, Z, C, and V, and these are just bits, which belong to, and are set on, the current program status register.

Instructions of the format Mnemonic{S}.., as in adds, can affect some, or all of the condition flags.

For example, arithmetic instructions, such as add.., or sub.., do affect all the condition flags, whereas logical, shift, and move operations, such as and.., lsr.. and mov.., do affect some condition flags.

Comparison instructions, such as cmp, or tst, do not contain an s in their mnemonic, but they do affect all condition flags.

The C flag

C, has the name of carry condition flag. Its value is affected, when a result cannot fit into a 32 bits range.

In the case of addition, and subtraction, when the result does not fit into 32 bits, the carry flag is set to 1 .

mov   r0, #0xffffffff
# move the value 0xffffffff 
# into the register r0.
mov   r1, #0x1
# move the value 0x1 into 
# r1.
adds  r2, r0, r1
# Add r0, and r1, and store 
# the result in r2.
# This amounts to performing
# the addition of:
# 11111111111111111111111111111111
# 00000000000000000000000000000001
# which results into
#100000000000000000000000000000000
# which is 0, and the carry flag is 
# set to 1. 


mov     r0, #0x2
mov     r1, #0x1
subs    r2, r0, r1
# subtracting 0x1 from 0x2, 
# will cause a carry flag to
# occur. 
# What happens is that the values
# 00000000000000000000000000000010
# 11111111111111111111111111111111
# will be added, since this is 
# two's complement subtraction, 
# which amounts to performing
# 2 + (-1), and the 
# result is 
#100000000000000000000000000000001
# which is 1.
# So what you end up with, is the
# carry flag being set to 1. 
# Negating a number in two's complement,
# is just inverting its bits, and adding
# 1.


mov     r0, #0x0
mov     r1, #0x0
subs    r2, r0, r1
# r2 = r0 - r1, so this is subtracting
# 0 from 0.
# The carry flag is also set to 1, since 
# you can  think that this amounts to the 
# addition of: 
# 00000000000000000000000000000000
#100000000000000000000000000000000
# which results in
#100000000000000000000000000000000
# which is zero, and the carry 
# flag is set to 1. 

mov    r0, #0x2
mov    r1, #0x0
subs   r2, r0, r1
# This is the same as the previous
# operation, and the carry flag
# is set to 1.

For shifting operations, the carry flag is set to the value, of the last bit shifted out, whatever the shifting direction is.

mov    r0, #0b100
mov    r1, #1
lsrs   r2,r0,r1
# lsrs, is logical shift right.
# in this case, r0 which contains
# 100 in binary, is shifted by r1,
# which is 1, so r0 is shifted by 1
# bit to the right, and the result
# is stored in r2.
# r2, will have a value of 10 in
# binary, and since the last shifted 
# out bit is 0, then this means, that 
# the carry bit is 0. 


mov    r0, #0x40000000
mov    r1, #2
lsls   r2,r0,r1
# lsls, is logical shift left,
# r0 if viewed in binary, is actually: 
# 01000000000000000000000000000000
# lsls   r2,r0,r1, amounts to performing
# r2= r0 << r1,
# so r0 is shifted by 2 bits to the left,
# and the carry is set to 1, since the
# last bit shifted out, is 1. 

For rotate operations, carry is set to the most significant bit, of the result.

mov    r0, #0b1
mov    r1, #1
rors   r2, r0, r1
# rors, is rotate right.
# r0 which is 0b1 is rotated right by 
# r1, which is 1, and the result is
# stored into r2, which value will be
# in binary:
# 10000000000000000000000000000000
# hence the result most significant 
# bit is 1, and the carry flag as
# such is set to 1. 

A carry can also occur, as the result of an inline shift operation.

mov r0, #0x80000000
movs r1, r0, LSL #1
# First 0x80000000 is stored,
# into r0, and next r0 is 
# shifted by 1, and the result
# is stored into r1. 
# r1 will have a zero value, and
# since the last bit shifted out
# is 1, then the carry flag is 
# set to 1.

A carry can also be caused by comparison operations. For example, cmp actually performs the subtraction of its operands, so the carry resulting from cmp, is as described, when discussing the addition and subtraction carry.

mov    r0, #1
mov    r1, #2
cmp    r0, r1
# compare the two registers,
# r0 and r1, this amounts
# to performing r0 - r1,
# which is 1 - 2, or 1 + (-2),
# which means, this is adding 
# 00000000000000000000000000000001
# 11111111111111111111111111111110
# and the result is 
# 11111111111111111111111111111111
# , so the carry flag is set to 0. 

If the instructions ANDS, ORRS, ORNS, EORS, BICS, TEQ, TST, MOVS, MVNS, are used, and operand2 is an immediate value, then the carry flag is set to the bit number 31, of this immediate value.

mov    r0, #0x1
orrs   r0, r0, #0x80000000
# 0x1 is moved into r0.
# orrs is the bitwise OR operator, which
# operand two is an immediate value, and
# which 31 bit is 1, hence the carry
# flag is set to 1.
# 0x80000000 is in binary
# 10000000000000000000000000000000

The V flag

V, has the name of overflow condition flag, this is only related to two’s complement , addition and subtraction.

Basically, in the case of addition, when adding two positive numbers, and the result is negative, this flag is set, and when adding two negative numbers, and the result is positive, this bit is set.

Additionally, in the case of subtraction, when subtracting a negative number, from a positive number, and the result is negative, this bit is set, and when subtracting a positive number, from a negative number, and the result is positive, this bit is set.

So this is related to truncation happening, which causes wrapping.

# Examples of adding positive numbers
# to positive numbers, and negative 
# numbers to negative numbers, that 
# causes an overflow to happen.

mov    r0, #0x40000000
adds   r1, r0, #0x40000000
# This operation actually amounts
# to the addition of 
# 01000000000000000000000000000000
# 01000000000000000000000000000000
# which results in:
# 10000000000000000000000000000000
# So adding two positive numbers,
# resulted in a negative number,
# this is overflow, so the overflow
# bit is set to 1.
# Additionally, and since the result
# fits into 32 bits, the carry
# flag is set to 0.

mov    r0, #0x80000000
mov    r1, #0x80000000
adds   r2, r1, r0
# The binary representation of 
# 0x80000000, is 
# 10000000000000000000000000000000
# In two's complement 32 bit, this
# is a negative number. 
# 0x80000000 is being added to itself
# as in 
# 10000000000000000000000000000000
# 10000000000000000000000000000000
# and the result is
#100000000000000000000000000000000
# Adding two negative numbers, resulted
# in a positive one, because of truncation,
# this is overflow, and the overflow bit
# is set to 1.
# Additionally the carry bit is set to 1, 
# since the result does not fit into 32 
# bits. 



# Examples of subtracting negative
# numbers, from positive ones, and
# positive numbers from negative ones,
# that lead to overflow.

mov    r0, #0x0
mov    r1, #0x80000000
subs   r2, r0, r1
# r1 is 0x80000000, which is a negative 
# number, and is being subtracted from 
# r0 which is 0, which is a positive
# number, of if you prefer a non negative
# one.
# When using two's complement subtraction, 
# this is as performing the addition, 
# of r0, with the two's complement of 
# r1, which amounts to:
# 00000000000000000000000000000000
# 10000000000000000000000000000000
# and which results into
# 10000000000000000000000000000000
# Subtracting a negative number from
# a positive number, must yield a 
# positive number, the result is 
# negative, as such this is overflow,
# and the overflow bit is set to 1.
# Additionally the carry flag is set 
# to 0, because the result does fit 
# 32 bits.



# Examples of subtracting negative
# numbers from negative ones, and
# positive numbers, from positive ones.
# In this case overflow does not apply.

mov    r0, #0x80000000
mov    r1, #0x80000000
subs   r2, r1, r0
# This sub operation, cannot
# cause an overflow to happen, 
# because two negative numbers
# are subtracted.
# r0 is 0x80000000, and is subtracted
# from r1 which is 0x80000000.
# The subtraction is performed by
# adding r1 with the two's complement
# of r0, which is the addition of:
# 10000000000000000000000000000000
# 10000000000000000000000000000000
# which results in:
#100000000000000000000000000000000
# This being said, the carry flag
# is set to 1. The overflow flag
# is not concerned with such operation
# so it is set to 0. 

mov    r0, #0x00000001
mov    r1, #0x00000002
subs   r2, r0, r1
# r0, and r1, contains both positive
# values, so to perform r2 = r0 - r1,
# r1 two's complement is calculated,
# and added to r0.
# The two's complement of r1 is in 
# binary
# 11111111111111111111111111111110
# so the subtraction amounts to the
# addition of:
# 00000000000000000000000000000001 
# 11111111111111111111111111111110
# which results in
# 11111111111111111111111111111111
# overflow does not apply to such
# operation, as such the overflow
# flag is set to 0.
# Additionally, the carry flag is set 
# to 0, since the result does fit 
# into 32 bits.

mov    r0, #0x80000000
mov    r1, #0x80000000
cmp    r0, r1
# cmp amounts, to r0 - r1. 
# The result of the subtraction is 
# not stored, but only the flags are 
# set.
# Subtraction is performed , by
# adding r0, to the two's complement
# of r1, which is the addition of:
# 10000000000000000000000000000000
# 10000000000000000000000000000000
# and which results in:
#100000000000000000000000000000000
# Overflow is not concerned with
# the subtraction of a negative 
# number from a negative one, as such
# it is set to 0, and the carry flag
# is set to 1, since the result does
# not fit into 32 bits.

The N flag

N has the name of negative condition flag, it is set to the most significant bit of the result, so if the most significant bit is 1, the N flag is set to 1, otherwise, and if the most significant bit is 0, then the N flag is set to 0.

Example of operations that can cause the N flag to be set, are, add, mul, mov, and, lsr, and cmp, which are arithmetic, data movement, and bitwise operations …

mov    r0, #0xC0000000
mov    r1, #0xC0000000
adds   r2, r0, r1
# Adding 0xC0000000 with itself,
# amounts to adding:
# 11000000000000000000000000000000
# 11000000000000000000000000000000
# which has a result of:
#110000000000000000000000000000000.
# The most significant bit being 1, means 
# that the negative flag is set to 1, and 
# since the result is larger than 32 bits, 
# it means that the carry flag is set to 1. 
# Two negative numbers were added, 
# the result is a negative number, so 
# overflow, does not apply, and the 
# overflow flag is set to 0 .


mov    r0, #0x40000000
mov    r1, #0x40000000
adds   r2, r0, r1
# Adding 0x40000000 to itself,
# amounts to adding:
# 01000000000000000000000000000000
# 01000000000000000000000000000000
# which has a result of:
# 10000000000000000000000000000000
# Since the most significant bit is 
# 1, the negative flag is set to 1.
# The carry flag is set to zero, since 
# the result fits 32 bits, and the 
# overflow flag is set to 1, since
# adding two positive numbers, 
# resulted in a negative one. 


movs    r0, #0x80000000
# 0x80000000 is 
# 10000000000000000000000000000000
# in binary, hence the most significant
# bit is 1, which means, that the
# negative flag is set to 1.


mov     r0, #0x0
orns    r1, r0, #0
# 0 is stored into r0. 
# orn amounts to applying
# a not on the immediate
# value, and oring the result
# with r0, and storing the
# gotten result, into r1.
# Visually this is performing
# or on: 
# 00000000000000000000000000000000
# 11111111111111111111111111111111
# which amounts, to 
# 11111111111111111111111111111111
# Since the result leading bit is 
# 1, the negative flag is set,
# additionally the carry flag is 
# not set, since the result fits into
# 32 bits.
# orns, does not set the overflow flag,
# as such the overflow flag, keeps any
# precedent value, set by any precedent
# operation.

The Z flag

Z has the name of zero condition flag, if the result of an instruction is zero, then it it set to 1, and if the result of an instruction is not zero, then Z is set to 0.

Example of instructions, which affect the zero flag, are arithmetic instructions, such as add, or logical instructions, such as and, or data movement instructions, such as mov, or shift instructions, such as lsr.

mov    r0, #0x1
mov    r1, #0x1
subs   r2, r0, r1
# subtracting 0x1 from 0x1, 
# amounts to adding :
# 00000000000000000000000000000001
# 11111111111111111111111111111111
# which results into:
#100000000000000000000000000000000
# the result being zero, the Zero flag
# is set to 1 , additionally the Carry
# flag is set to 1, since the result does
# not fit into 32 bits. 
# The negative flag is set to 0, since the
# most significant bit is 0.
# The overflow bit is set to zero, since
# we are subtracting two positive numbers,
# so overflow does not apply.

mov    r0, #0x1
mov    r1, #0xFFFFFFFE
cmn    r0, r1
# cmn stands for compare negative, 
# and this amounts to adding 
# r0 with r1. So this amounts 
# to adding
# 00000000000000000000000000000001
# 11111111111111111111111111111110
# which has a result of:
# 11111111111111111111111111111111
# This being said, the result is not
# zero, hence the zero flag is set to
# 0. 
# The leading bit of the result is 1
# as such, the negative bit is set to 
# 1. 
# The carry flag is set to 0, because
# the result fits into 32 bits.
# The overflow flag is set to 0, since
# we are adding a positive and a 
# negative number, so overflow does
# not apply.


mov    r0, #0x1
mov    r1, #0x1
teq    r0, r1
# teq will perform the xor 
# operation, on its operands. 
# This means xoring:
# 00000000000000000000000000000001
# 00000000000000000000000000000001
# which results into
# 00000000000000000000000000000000
# The result being zero, the zero
# flag is set to 1.
# The most significant bit of the 
# result being zero, the negative
# flag is set to zero. 
# The result fits into 32 bits, 
# as such the carry flag is set
# to 0.
# teq does not set the overflow flag,
# as such the overflow flag, keeps
# its old value, set by any precedent
# operation.

The conditions

Having understood, the condition flags, let us check the conditions affected to these condition flags.

EQ, NE

EQ means, that an instruction will proceed, if and only if, the Z flag is set to 1.

mov    r0, #0x1
cmp    r0, #0x1
addEQ  r1, r0, #0x1
# 0x1 is moved into r0, 
# cmp will perform  r0 - 0x1
# which is 0x1 - 0x1, which
# amounts to adding:
# 00000000000000000000000000000001
# 11111111111111111111111111111111
# which results into:
#100000000000000000000000000000000
# which means that the zero flag is 
# set to 1.
# add having the EQ condition,
# will execute, only if the Z flag 
# is set to 1, which is, hence in this 
# case, add does execute, and r1 will 
# have a value of 0x2, which is the 
# addition of r0 to 0x1.

NE means, only perform the instruction, if the Z flag bit, is set to 0.

mov    r0, #0x1
cmp    r0, #0x2
addNE  r1, r0, #0x1
# 0x1 is moved into r0, cmp
# will perform r0 - 0x2, which
# is 0x1 - 0x2, which amounts
# to the addition of:
# 00000000000000000000000000000001
# 11111111111111111111111111111110
# and which has a result of:  
# 11111111111111111111111111111111
# The result being not zero, the 
# zero flag is set to 0.
# addNE will execute, if and only if
# , the Z flag is set to 0, which is,
# so 0x1 is added to r0, which is the
# addition of 0x1 + 0x1, as such r1,
# will have a value of 0x2.

CS, CC

CS means, that the operation is to be performed, if an only if, the carry flag is set to 1.

mov    r0, #0x80000000
lsls   r1, r0, #1
addCS  r1, r1, #1 
# r0 will contain the value of 
# 0x80000000, which in binary is
# 10000000000000000000000000000000
# Next the instruction lsls, will
# shift left logically, r0 by 1, 
# and store the result into the 
# r1 register, which will have the
# value:
# 00000000000000000000000000000000
# The carry flag, will have the value
# of the last shifted out bit, which 
# is 1, as such addCS will execute,
# and r1, is added to the value #1, 
# and the result is stored into
# r1, which will have a value of
# 0x1.

CC means, that an instruction is executed, if and only if, the cary flag is set to 0.

mov    r0, #0x1
ands   r0, r0 , #0x0
cmpCC  r1, #0 
# 0x1 is stored into r0, 
# next r0 is anded with 0x0,
# which amounts to anding
# 00000000000000000000000000000001
# 00000000000000000000000000000000
# which is
# 00000000000000000000000000000000
# and the result is stored into r0.
# The carry flag is set to 0, since
# the result fits 32 bits, as such the 
# cmpCC instruction is executed.
# cmpCC is the compare instruction,
# which will operate by subtracting
# #0 from r1, which amounts to adding
# 00000000000000000000000000000000
#100000000000000000000000000000000
# which has a result of 0 .

VS, VC

An instruction containing VS is executed, when the overflow flag is set to 1, and an instruction containing VC is executed, when the overflow flag is cleared, so when it is set to 0.

mov    r0, #0x80000000
adds   r0, r0, r0
adcVS  r0, r0, #0
# First 0x80000000 is stored into
# r0, next r0 is added to itself,
# and the result is stored into
# r0.
# This amounts to adding: 
# 10000000000000000000000000000000
# 10000000000000000000000000000000
# which amounts to:
#100000000000000000000000000000000
# So this means, that adding two
# negative values, resulted into
# a positive one, so the overflow
# flag is set to 1, additionally the
# carry flag is set to 1, since the 
# result does not fit into 32 bits.
# The adcVS instruction, will execute when
# the overflow flag is set to 1, which is.
# adc is add with carry, which is going 
# to perform the addition of r0 with #0, 
# with the carry flag, and the result
# is stored in r0. 
# So this amounts to adding, 0 + 0 + 1 
# which results into 1, and 1 is stored 
# into r0. 


mov    r0, #0x1
subs   r0, r0, #0
adcVC  r0, r0, #0 
# 0x1 is stored into r0, 
# next #0 is subtracted from
# r0, and the result is stored 
# in r0.
# This amounts to adding
# 00000000000000000000000000000001
#100000000000000000000000000000000
# which is
#100000000000000000000000000000001
# The carry flag is set to 1, since
# the result does not fit into 32 bits,
# and the overflow flag is set to 0, since
# we are subtracting two positive values
# so overflow does not apply.
# adcVC, will execute if the overflow
# flag is cleared, so if it is set to
# 0, which is, so adc , which is add
# with carry will execute. 
# So r0 will have a value of:
# r0 = r0 + #0 + carry flag 
# which is 1 + 0 + 1 
# so r0 will be 2.

MI, PL

MI stands for minus, and PL stands for plus. An instruction containing MI is executed, if the N flag is set to 1, and an instruction containing PL is executed, if the N flag is set to 0.

movs    r0,  #0x80000000
lslMI	r0, r0,  #1
# 0x80000000 is moved into r0, 
# its binary representation is:
# 10000000000000000000000000000000
# Its most significant bit is 1, 
# hence the negative flag is set to 1. 
# lslMI, which is the logical shift left,
# will only execute, if the negative 
# flag is set to 1, which is, so in 
# this case r0 is shifted by 1 to 
# the left, which results into r0, 
# having a zero value.


movs    r0,  #0x1
rrxPL	r0, r0
# 0x1 which is in binary 
# 00000000000000000000000000000001
# is stored into r0, its most significant 
# bit being a 0, means that the 
# negative flag is set to 0.
# rrxPL, stands for rotate right one bit, 
# and since it is affected with the 
# condition PL, this means that the 
# instruction will only execute,
# if the negative flag, is set to
# zero, which is. 
# rrx will remove the last bit of
# r0, so this means, that there is
# one extra bit, which is the most
# significant bit, which needs a value,
# in this case, it is affected the value
# of the carry flag, which is in this case
# 0. so r0 will have a stored value of:
# 00000000000000000000000000000000.

GE, LT

GE stands for greater or equal, and this condition applies, when N = V.

# GE means Greater or equal, and is
# executed when N = V, which means
# when both the negative and overflow
# flags are set to 1, or when they are 
# both set to 0 .
# As an example if both N and V are
# set to 0, this means that a result
# is positive, and that no overflow 
# occurred.

mov r0, #2
mov r1, #1
cmp r0, r1 
addGE r3, r0, r1
# 2 is moved into r0, 
# and 1 is moved into r1
# cmp will perform r0 - r1
# which is the addition of 
# r0 + (-r1), and which amounts
# to: 
# 00000000000000000000000000000010
# 11111111111111111111111111111111
# and which result is:
#100000000000000000000000000000001
# The negative flag is set to 0, 
# since the leading bit is not 1, 
# the carry flag is set to 1, since
# the result does not fit into 32 bits.
# overflow is set to 0, since subtracting
# two positive numbers, does not affect 
# overflow.
# addGE, will execute, since the negative
# flag is zero, and the overflow flag
# is zero, so this amounts to
# having r3 = r0 + r1
# which is r3 = 2 + 1 = 3.

LT stands for less than, and this condition is applicable, when N!=V.

# A less than instruction is executed,
# when N!=V, this happens,
#    When negative and no overflow, 
#      which is N set to 1, and V 
#      is set to 0.
#    When positive and overflow, 
#      which is N set to 0, and V
#      set to 1.

mov r0, #0x1
mov r1, #0x2
cmp r0, r1
eorLT r2, r0, r1
# 0x1 is moved into r0, 
# and 0x2 is moved into
# r1.
# cmp will perform r0 - r1
# which is r0 + (-r1), which
# is
# 00000000000000000000000000000001
# 11111111111111111111111111111110
# and the result is:
# 11111111111111111111111111111111
# The negative flag is set to 1, since
# the most significant bit is 1, and
# the overflow flag is set to 0, since
# we are subtracting two positive numbers
# so overflow does not care.
# eorLT, is exclusive or, LT will make this
# exclusive or, executable, only if the less
# than condition is fulfilled, which is, 
# since N=1 , and V=0 , so N!=V.
# EOR is applied as such:
# 00000000000000000000000000000001
# 00000000000000000000000000000010
# and the result is:
# 00000000000000000000000000000011

GT, LE

GT means greater than, and this condition applies , if and only if Z=0 & N=V.

# GT means greater than, and is 
# executed when Z=0 and N=V. 
# so in other words, this means:
#    Non zero, positive, and no overflow.
#    Non zero, negative and overflow.

mov    r0, #0x7FFFFFFF
adds   r0, r0, r0
lsrGT  r0, r0, #1
# 0x7FFFFFFF is moved into r0, 
# next add is going to perform
# r0 = r0 + r0.
# which basically is adding
# 01111111111111111111111111111111
# 01111111111111111111111111111111 
# and the result is:
# 11111111111111111111111111111110
# The negative flag is set to 1, 
# the overflow flag is set to 1, 
# since adding two positive numbers,
# resulted into a negative one, the
# zero flag is set to 0, because the
# result is not a zero.
# Since the flags non zero, negative,
# and overflow are set to 1, the GT 
# condition is met, and as such lsrGT
# will execute.
# lsr is logical shift right, which 
# amounts to r0 = r0 >> 1 ,
# so this results into r0, having
# the value, 
# 011111111111111111111111111111111

LE stands for less than or equal, and this condition is applicable, if and only if Z=1 or N=!V.

# LE means less or equal, and this
# condition is executed, when the
#     zero flag is set to 1, 
#     so when the result is zero, 
# or it is executed when the 
#     negative flag is different from the 
# overflow flag, which means:
#    Negative and no overflow, 
#      negative being the negative flag 
#      is 1,
#      and no overflow being the overflow
#      flag is 0.
#    Non negative and overflow,
#      non negative being the negative flag,
#      is 0,
#      and overflow being the overflow flag,
#      is 1.

mov    r0, #0x80000000
subs   r0, r0, #1
lsrLE  r0, r0, #1
# move 0x80000000 into the register
# r0. Next apply the sub instruction
# which is r0 = r0 + (-1 ), 
# and which amounts to adding
# 10000000000000000000000000000000
# 11111111111111111111111111111111
# and which results into
#101111111111111111111111111111111
# The zero flag is set to 0, since
# the result is not zero, the negative 
# flag is set to 0, since the most 
# significant bit is 0, the overflow 
# flag is set to 1, since a positive
# number is being subtracted from a 
# negative one, and the result is positive,
# the carry flag is set to 1, since the 
# result does not fit into 32 bits. 
# So this means that the result is
# non zero, non negative, and overflow
# so the LE condition is met, and lsrLE
# is executed, which is logical shift
# right, so r0 = r0 >> 1, and the result
# is:
# 001111111111111111111111111111111

HI, LS

HI means unsigned higher, and this condition is met only if, C=1 & Z=0.

# HI means unsigned higher, and this
# condition applies when 
#    carry and not zero. 
# This happens, when the result
# does not fit into 32 bits, so
# carry is set to 1.
# Additionally, the result must
# not be a zero, so the zero flag
# is set to 0.

mov    r0, #0x7FFFFFFF
subs   r0, r0, #1
asrHI  r0, r0, #1
# 0x7FFFFFFF is moved into
# r0. Next the sub instruction
# is applied, which amounts to
# having r0 = r0 + ( -1 ) 
# which is adding
# 01111111111111111111111111111111
# 11111111111111111111111111111111
# and results into 
#101111111111111111111111111111110
# The carry flag is set to 1, and the
# result is non zero, so the condition
# HI is met, and the instruction asrHI
# is executed, which is arithmetic shift
# right, so the result is 
# 001111111111111111111111111111111

LS means unsigned lower or same, and the condition is applicable, if and only if, C=0 or Z=1.

# LS means unsigned lower or same.
# it applies when no carry or zero. 
# no carry means, the carry flag is
#     cleared, which means it is set to 0.
# Or zero means, the zero flag is set
# to 1.
# So zero meaning the same, 
# and carry meaning unsigned
# less than. 

mov r0, #0x7FFFFFFF 
subs r0, r0, r0
asrLS r0, r0, #1
# move 0x7FFFFFFF into 
# r0, and next perform, 
# the sub operation, which is
# r0 = r0 + (-r0), which 
# amounts to the addition of
# 01111111111111111111111111111111
# 10000000000000000000000000000001
# and the result is:
#100000000000000000000000000000000
# In this case, the zero flag is set
# to 1, since the result is 0, and
# the carry flag is set to 1, since
# the result does not fit 32 bits. 
# Since the zero flag is set, the same 
# condition holds, as such LS holds,
# and asrLS which is arithmetic
# shift left, is executed, so what 
# applies is r0 = r0 << 1, and
# r0 is equal to:
# 00000000000000000000000000000000

HS, LO

HS, stands for unsigned higher, or same, and this condition is applicable, if and only if c=1.

# HS means either unsigned higher 
# or same.
# This condition holds, when the
# carry flag is set to 1, so when
# carry happens.

mov r0, #0x80000000
mov r1, #0x80000001
add r0, r0, r1
lsrHS r0, r0, #1
# 0x80000000 is moved into r0, 
# and 0x80000001 is moved into r1,
# next the add operation is applied,
# which is r0 = r0 + r1 , which 
# amounts to adding:
# 10000000000000000000000000000000
# 10000000000000000000000000000001
# which results into:
#100000000000000000000000000000001
# The carry flag is set to 1, as such
# the higher or same condition applies,
# so the lsrHS instruction is executed, 
# which is logical shift right, and which
# amounts to doing: r0 = r0 >> 1,
# and the result is 
# 00000000000000000000000000000000

LO, stands for unsigned lower, and this condition applies only if, c=0.

# LO stands for unsigned lower, 
# the condition applies, when
# the carry flag is cleared, 
# so when it is set to 0. 

mov r0, #0x00000000
mov r1, #0x80000001
add r0, r0, r1
asrLO r0, r1, #1
# 0x00000000 is moved into
# r0, and 0x80000001 is 
# moved into r1. Next the 
# add instruction is applied,
# which is r0 = r0 + r1, 
# and which amounts to adding
# 00000000000000000000000000000000
# 10000000000000000000000000000001
# and which results into
# 10000000000000000000000000000001
# so the carry flag is cleared, 
# and the LO condition holds, so
# the asrLO instruction is executed,
# in this case it is arithmetic shift 
# right by 1, and it results into
# 11000000000000000000000000000000

AL

AL stands for always, which means regardless of the condition flags, set into the CPSR register, the instruction is executed.

This is the default condition, and it is not necessary to append it, to the mnemonic of instructions.

movAL    r0, #0x00000000
# AL means always execute an instruction,
# so in other words, regardless of the flags, 
# set in the CPSR register, the instruction 
# is always executed.
# movAL and mov, amounts to doing the same 
# thing.

When are condition flags cleared?

The condition flags, are N,Z,C,V. Some instructions affect all the condition flags, such as the add instruction, whereas others, do affect only few, such as AND, or none such as smull, which is a signed 64 bit multiply.

This being said, when a condition flag is set by an instruction, its value is only changed, when another instruction sets it to a different value.

mov    r0, #0x7FFFFFFF
adds   r0, r0, r0
eors   r0, r0, r0
# 0x7FFFFFFF is moved into r0, 
# and next the add instruction
# r0 = r0 + r0, is executed, which
# amounts to the addition of:
# 01111111111111111111111111111111
# 01111111111111111111111111111111
# and which results into:
# 11111111111111111111111111111110
# The negative flag is set to 1, since
# the most significant bit is 1,
# the overflow flag is set to 1, since 
# adding two positive numbers, resulted
# into a negative one, the carry flag
# is set to 0, since the result does
# fit into 32 bits, and the zero flag
# is set to zero, since the result is
# nonzero. 
# After that the eors instruction is 
# executed, it results into applying 
# an exclusive or on:
# 11111111111111111111111111111110
# 11111111111111111111111111111110
# which results into
# 00000000000000000000000000000000,
# so in this case, the zero flag is
# set to 1, since the result is zero,
# the carry flag is set to 0, since
# the result does fit into 32 bits, 
# the negative flag is set to 0, since
# the result is non negative, and 
# the overflow flag, keeps its old value
# which is 1.

Instruction set cheat sheet

Arithmetic operations

DescriptionDoesOperationFlags
add.Rd = Rn + Operand2.add{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
add with carry.Rd = Rn + Operand2 + Carry flag.adc{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
saturating signed add.Rd = saturating(Rn+Rm).
If the result is larger or equal to 2**31, or less than -(2**31), then it is set to (2**31) - 1 or -(2**31), and the Q flag is set.
The Q flag is sticky, which means it must be explicitly cleared.
qadd{cond} Rd, Rn, RmQ
saturating signed add.Rd = saturating(Rn+Rm).
If the result is larger or equal to 2**8, or less than -(2**8), then it is set to (2**8) - 1 or -(2**8), and the Q flag is set.
The Q flag is sticky, which means it must be explicitly cleared.
qadd8{cond} Rd, Rn, RmQ
subtract.Rd = Rn - Operand2.sub{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
subtract with carry.Rd = Rn - Operand2 - not(Carry flag).sbc{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
reverse subtract.Rd = Operand2 - Rn.rsb{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
reverse subtract with carry.Rd = Operand2 – Rn - not(Carry flag).rsbc{S}{cond} Rd, Rn, <Operand2>N,Z,C,V
saturating signed subtract.Rd = saturating(Rn - Rm).
If the result is larger or equal to 2**31, or less than -(2**31), then it is set to (2**31) - 1 or -(2**31), and the Q flag is set.
The Q flag is sticky, which means it must be explicitly cleared.
qsub{cond} Rd, Rm, RnQ
Multiply.Rd = Rn × Rm.mul Rd, Rn, RmN,Z
Multiply and add.Rd = Ra + (Rn × Rm).mla Rd, Rn, Rm, Ra N,Z
Multiply and Subtract.Rd = Ra - (Rn × Rm).mls Rd, Rn, Rm, RaN,Z
Signed 64 bit multiply.RdHiLo = Rn × Rm.
Rm is multiplied by Rn, and the upper 32 bits of the result, are stored into RdHi, and the lower 32 bits of the result, are stored into RdLo.
smull RdLo, RdHi, Rn, Rm
Unsigned 64 bit multiply.RdHiLo = Rn × Rm.
Rm is multiplied by Rn, and the upper 32 bits of the result, are stored into RdHi, and the lower 32 bits of the result, are stored into RdLo.
umull RdLo, RdHi, Rn, Rm
signed division.Rd = Rn / Rm
sdiv{cond} Rd, Rn, Rm
unsigned division.Rd = Rn / Rm
udiv{cond} Rd, Rn, Rm
# addition examples
add r0, r1, #1
# r0 = r1 + 1 
add r0, r1, r2
# r0 = r1 + r2
add r0, r1, r2, LSL #1
# r0 = r1 + ( r2 << 1 )
add r0, r1, r2, LSL r3
# r0 = r1 + ( r2 << r3 )

adc r0, r1, r2
# r0 = r1 + r2 + carry flag

qadd r0, r1, r2
# r0 = saturating( r1 + r2 )



# subtraction examples
sub r0, r1, r2
# r0 = r1 - r2

subc r0, r1, r2
# r0 = r1 - r2 - not( carry flag )

rsb r0, r1, r2
# r0 = r2 - r1

rsbc r0, r1, r2
# r0 = r2 - r1 - not( carry flag )



# Multiplication examples
mul r0, r1, r2
# r0 = r1 * r2
# 32 bits multiplication

mula r0, r1, r2, r3
# r0 = r3 + ( r1 * r2 )

mls r0, r1, r2, r3
# r0 = r3 - ( r1 * r2 )

smul r0, r1, r2, r3
# signed 64 bit multiplication, 
# r2 is multiplied by r3
# The lower 32 bits of the
# result are placed into
# r0, and the upper 32 bits
# of the result are place 
# into r1.

umul r0, r1, r2, r3
# unsigned 64 bit multiplication,
# r2 is multiplied by r3. 
# The lower 32 bits, 
# are placed into r0, 
# and the upper are placed
# into r1. 



# Division examples
sdiv r0, r1, r2
# signed division
# r0 = r1 / r2

udiv r0, r1, r2
# unsigned division
# r0 = r1 / r2

What follows is an example of using adc, which is add with carry, to perform addition over more than 32 bits.

mov    r0, #0x1
mov    r1, #0x80000000
mov    r3, #0x2
mov    r4, #0x80000000
adds   r5, r1,r4
adc    r6,r0,r3

# What is to be performed is
# the addition of 
# 0x180000000
# 0x280000000
# which are two numbers having
# more than 32 bits. 
# The first number, is written
# using the registers, r1 which 
# stores the lower 32 bits, hence
# 0x80000000, and r0 which stores
# the higher bits, hence 0x1. 
# The second number is written
# using r4, to store the lower 
# 32 bits, hence 0x80000000, and
# r3 to store the upper bits,
# hence 0x2.
# Later on, the lower 32 bits of
# the two numbers, are added, 
# using the adds instruction, which
# set flags, such as if carry has 
# happened to 1 and the result is 
# stored into r5.
# Following that, adc, which is add 
# with carry is executed, adc will add 
# the upper bits, which are stored in 
# r0 and r3, with any carry flag 
# value, which has resulted from the 
# earlier add operation, and the result 
# is stored in r6.
# The final content of both the r5, 
# and r6 registers is :
# r5: 0x0
# r6: 0x4

Data movement operations

Data movement operations, can be used to move data encoded in an instruction, to a register, or it can be used to move data between registers.

DescriptionDoesOperationFlags
MoveRd = Operand2mov{S}{cond} Rd, <Operand2>N,Z,C
Move NOTRd = ~Operand2mvn{S}{cond} Rd, <Operand2>N,Z,C
# Move instruction
mov r0, r1
# r0 = r1
mov r0, #0x1
# r0 = 0x1 
# Data is stored in the 
# instruction. 
mov r2, r1, LSL #1
# r2 = r1 << 1
mov r4 , r3 , LSL r2
# r4 = r3 << r2



# Move not instruction
mvn r0, r1
#r0 = ~r1

mvn r0, #0x1
# r0 = ~0x1
# which is 
# 11111111111111111111111111111110

mov r0, #0x1
mvn r1, r0, LSL #1
# r1 = ~(r0 LSL #1)
# r0 is
# 00000000000000000000000000000001
# r0 LSL # 1 is
# 00000000000000000000000000000010
# ~(r0 LSL #1) is 
# ~(00000000000000000000000000000010)
# which is
# 11111111111111111111111111111101

mov r0, #0x1
mov r1, #0x1
mvn r2 , r0 , LSL r1
# Rd = ~(r0 << r1 )
# r0 is 
# 00000000000000000000000000000001
# r0 << r1 is r0 <<1
# which is 
# 00000000000000000000000000000010
# ~(r0 LSL r1) is
# ~(00000000000000000000000000000010)
# which is
# 11111111111111111111111111111101

The following example shows, how mvn, and add, can be used to perform multiplication by -1.

mov    r0, #0x1
mvn    r1, r0
add    r1, r1, #0x1
# This code, shows how mvn, and
# add can be used to perform
# multiplication by -1. 
# 0x1 is moved into r0, and
# next mvn is used to perform
# a bitwise not, on the bits of
# r0, and the result is stored 
# in r1.
# 0x1 is next added to r1, and
# the gotten value is stored in
# r1.
# This is the same as negating a 
# number in two's complement.
# Negating a number in two's 
# complement, can be done by inverting 
# the bits, and adding 1. 

Data transfer operations

In the previous section, entitled data movement operations, we discussed how to move data between registers, as in mov r0, r1, and how to move an immediate value, into a register, as in mov r0, #0x2.

This section will explain, how to transfer data from memory into a register. Transferring can be either, to load data, which is done using the ldr instruction, or it can be to store data, which is done using the strinstruction.

The format of these two instructions is as follow:

str|ldr{size}{T}{cond} Rt, [Rn {, #offset}]
str|ldr{size}{T}{cond} Rt, [Rn], #offset
str|ldr{size}{T}{cond} Rt, [Rn], +/-Rm {, shift} 
str|ldr{size}{cond} Rt, [Rn, #offset]!
str|ldr{size}{cond} Rt, [Rn, +/-Rm {, shift}]
str|ldr{size}{cond} Rt, [Rn, +/-Rm {, shift}]!
  • ldr or str, is the operation you wish to perform, ldr for loading, and str for storing.
  • curly brackets, means what is between is optional.
  • size is the size of the data to be transferred, it can be:

    • B, as in to transfer an unsigned byte. In the case of ldr, it is zero extended to 32 bits.
    • SB, as in to transfer a signed byte.SB can only be used, with the ldr instruction, and the loaded byte, is sign extended to 32 bits.
    • H, as in to transfer an unsigned half word. In the case of ldr, it is zero extended to 32 bits.
    • SH, as in to transfer a signed half word.SH can only be used, with the ldr instruction, and the loaded half word, is sign extended to 32 bits.
    • If the size is not specified, then a word is going to be transferred.
  • T, applies only to the first three format, and it means that memory is accessed, as if the processor is in user mode .
  • cond, is one of the conditions, described in the conditions section.
  • Rt is the register where data is to be transferred to, so in other words, the destination register.
  • Rn is the register containing the base address.
  • The base address can be offset by a numeric value, which can be 0, or by another register, which can optionally be shifted.

    • offsetting can be done beforehand, by using it inside brackets, as in [Rn {, #offset}], and this is called pre indexed addressing.
    • offsetting can be done beforehand, and after offsetting has occurred, and the memory is read into the register, the base address is updated to a new value, which is the old base address, plus the offset value. This can be done for example using [Rn, #offset]!, so by using the symbol !. This kind of offsetting is called, pre indexed with write back.
    • offsetting can be done afterhand, by using it outside brackets, as in [Rn], #offset, in this case the memory is read from the base address, and after that, the base address value, is updated with the old base address value, plus the offset. This is called post index, with write back.

# Load example

mov    r0 , #0x108
# move 0x100 into register
# r0.

ldr    r1 , [r0]
# load the memory pointed
# by the address contained
# in the register r0, which 
# is 0x108, into r1. 
# Since no size has been affected
# to ldr, a word which is 32 bits
# is loaded into r1.

ldr    r1, [r0, #-0x8]
# The base address stored in r0, 
# is added to -0x8, so in this
# case the memory pointed by the 
# address 0x100, is loaded into
# r1.

ldr    r1, [r0, #-0x8]!
# The base address stored in r0, 
# is added to -0x8, so in this
# case the memory pointed by the 
# address 0x100, is loaded into
# r1.
# And since an exclamation mark
# follows offsetting, r0 is
# set to r0 + -0x8, which amounts
# to r0 having a value of 0x100.

ldr    r1, [r0], #0x8
# The base address stored in r0, 
# which is 0x100, has its memory
# read, and stored into r1. 
# After that r0 will be set to
# r0 + 0x8, so r0 will have
# a value of 108.

mov    r2, #0x4
# mov 0x4 into r2.

ldrb   r1, [r0, r2, LSR #2 ]
# The base address is r0, which is 
# 0x108, r2 which contains 0x4, is 
# shifted by 2 to the right, so the
# memory address to read data from,
# is r0 + (r2 >> #2 ), which is 
# 0x108 + (0x4 >> 2) which is 
# 0x108 + 0x1 = 0x109.
# ldrb, reads an unsigned byte into
# a register, so the byte from the 
# found memory location is read,
# and the upper 24 bits of the register
# are set to 0.

mov    r3, #0
# store 0 into r3.

ldr   r1, [r0, r3, LSR r3 ]
# r3 which is 0, is shifted right 
# by r3, which is 0, this results 
# into 0, which is added to the
# base address r0, which is
# 0x108, so data from memory 
# location 0x108 is read into r3.



# Store example

mov    r0, #0x100
# move 0x100 into r0

mov    r1, #0xFFFFFFFF
# move 0xFFFFFFFF into r1

str    r1, [r0]
# store the word found in r1,
# so the whole 32 bits of r1,
# into the memory location, 
# pointed by r0, which is 0x100
# so the memory location starting 
# the address 0x100, will contain
# on four bytes, the value
# 0xFFFFFFFF

mov    r1, #0xFFFFFF00
# move 0xFFFFFF00 into r1

strb   r1, [r0]
# store the least significant byte
# of r1, which is 0x00, into the 
# memory location pointed by
# r0, which is 0x100, so 
# starting this memory location
# data will be 0x00FFFFFF

mov    r2, #0x4
# move 0x4, into r2

strb   r1, [r0, r2, LSR #1]
# The base address is r0, which
# is 0x100. r2 is 0x4, and is 
# shifted to the right by 1, 
# so it will become 0x2, and 
# the memory location, where
# the lower byte of r1, which 
# is 0x00, is to be stored is
# 0x102.
# This means that starting
# the address 0x100, the memory 
# will have the following data, 
# for the next four bytes.
# 0x00FF00FF

As explained in the instruction format section, when an immediate operand is used , only a range of values can be represented. The range is 8 bits, rotated over 32 bits, using even values between [0,30] inclusive, so for example 2, 4

ldr can be used to load 32 bits into a register, the notation to do so is:

LDR rd, =const

This expression, is either translated by the compiler, to a mov, or mvn instruction, when possible, otherwise, the constant is stored in a constant pool, which is then loaded, using ldr into the specified register.

mov    r0, #0x83FFFF1F
# When using this mov instruction,
# the assembler will generate an
# error, of invalid operand,
# because an immediate value must be 
# creatable by rotating an 8-bit
# number right within a 32 bit word.

ldr    r0, =0x83FFFF1F
# Notice that we did not use
# a hash symbol #, before the
# constant value, and that using
# ldr we were able to load a
# value, which we could not load
# using mov.

ldr can additionally be used with a label, a label is just the address of data, or of an instruction in memory, where the label is used.

ldr{cond} Rt, =Label
# The ldr instruction, will 
# Load the address, where the 
# label is defined, into Rt.

As an example:

.global main
main:
    ldr r0, =numAdd
    # Load the address, 
    # numAdd is defined at, 
    # into r0.
    ldr r1, [r0]
    # Load the data labeled by
    # numAdd, which is a memory
    # address, into r1.

.data
numAdd: .word 0x1

Multiple registers, can be loaded at the same time, from consecutive memory addresses, or stored at the same time, into consecutive memory locations, using the instructions ldm, and stm.

stm|ldm{IA|DB}{cond} Rn{!}, reglist{^}
  • stm and ldm, are the operations that you want to perform. stm means, store multiple registers, and ldm means, load multiple registers.
  • {} means that what is in between, is optional.
  • IA means increment address after each access, this is the default option, if none is specified. DB means decrement address, before each access.
  • cond, is one of the conditions, talked about earlier, in the conditions section, for example EQ.
  • Rn is the register, which contains the base address, starting at which, data is going to be fetched or stored.
  • !, if this symbol is used, then the base address will be updated, with the final address gotten, after doing IA, or DB operations.
  • reglist, is just the list of registers, that you want, to read data from, or write data to. The registers must be enclosed in between Curley brackets, and separated by a comma , , and if a dash - is used between two registers, it means a range.
  • ^, An arm processor, can be in multiple modes, such as user, or fast interrupt mode.
    When the carrot ^ is used, and the processor is in a mode beside user or system, and the register list does not contain the PC, which is the program counter register, the processor will perform the data transfer to user mode registers, instead of current mode registers.
    If the register list, does contain the program counter register, then transfer to registers happen normally, additionally the SPSR register, which is the saved program status register, is copied into the CPSR register.
    .global main
main:
    ldr r0, =array_1
    mov r1, #0x1
    mov r2, #0x2
    mov r3, #0x3
    mov r4, #0x4
    stm r0!, {r1-r4}
    ldmDB r0, {r5,r6-r8}

    .bss
array_1 .fill 4,4,0

# The address of array_1 is loaded 
# into r0, next 0x1 is moved into
# r1, 0x2 is moved into r2, 0x3 is
# moved into r3, and 0x4 is moved 
# into r4.
# The stm multiple instruction is 
# executed. stm will store the 
# registers, r1 to r4 into memory, 
# starting at the address pointed by 
# r0. 
# By default, and if IA or DB, are not 
# used with stm, or ldm, it is as if
# IA is used. IA is increment the address  
# after access.
# Additionally and since the ! symbol is
# used, r0 will be updated to the final 
# gotten address.
# Next the ldmDB instruction is used,
# this will load data starting the memory
# location, pointed by r0, into the 
# registers, r5, r6, r7, r8.
# Since DB is used, this means, decrement
# before access is performed.

# Note, that the contents of Lower 
# memory addresses, are always transferred 
# to and from lower register, as in order,
# lower to lower register, higher to 
# higher reigsters.

Logical operations

DescriptionDoesOperationFlags
Logical ANDRd = Rn & Op2.
1010 &
1100 =
1000
and{S}{cond} Rd, Rn, <Operand2>N,Z,C
Logical ORRd = Rn | Op2.
1010 |
1100 =
1110
orr{S}{cond} Rd, Rn, <Operand2>N,Z,C
Logical OR NOTRd = Rn | ~Op2.
1010 |
~1100 =
1010 |
0011 =
1011
orn{S}{cond} Rd, Rn, <Operand2>N,Z,C
Exclusive ORRd = Rn ^ Op2.
1010 ^
1100 =
0110
eor{S}{cond} Rd, Rn, <Operand2>N,Z,C
Bit ClearRd = Rn & ~Op2.
1010 &
~1100 =
1010 &
0011 =
0010
bic{S}{cond} Rd, Rn, <Operand2>N,Z,C
and r0, r1, r2
# r0 = r1 & r2
and r0, r1 , #1
# r0 = r1 & #1
and r0, r1, r2, LSR #1 
# r0 = r1 + ( r2 >> 1 )
and r0, r1, r2, LSR r3 
# r0 = r1 + ( r2 >> r3 )

orr r0, r1, r2
# r0 = r1 | r2

orn r0, r1, r2 
# r0 = r1 | ~r2

eor r0, r1, r2
# r0 = r ^ r2 

bic r0, r1, r2
# r0 = r1 & ~r2

Comparison operations

The comparison instructions, cmp, cmn, teq, and tst , perform a subtraction, or an addition, or an exclusive or, or an and, between their operands.

The result of the operation, is not stored anywhere, and only the condition flags, N,Z,C,V, are set appropriately.

DescriptionDoesOperationFlags
CompareRn – Operand2cmp{cond} Rn, Operand2N,Z,C,V
Compare NegativeRn + Operand2cmn{cond} Rn, Operand2N,Z,C,V
Test EquivalenceRn ^ Operand2teq{cond} Rn, Operand2N,Z,C,V
Test BitsRn & Operand2tst{cond} Rn, Operand2N,Z,C,V
# Compare operation 
mov r0, #0
# 0 is stored into r0.
cmp    r0, r0
# performs r0 - r0, and  set the flags,
# in this case, it sets both Z, and C 
# to 1.
cmp    r0, #0
# performs r0 - 0, and set the flags,
# in this case, it sets both Z, and C
# to 1. 
cmp    r0, r0, LSL #1
# performs r0 -  (r0 << 1), and set the
# flags, in this case, it sets both Z, and
# C to 1.
cmp    r0, r0, LSL r0
# performs r0 -  (r0 << r0), and set the
# flags, in this case, it sets both Z, and
# C to 1.


# compare negative operation
mov r0, #1
mov r1, #-1
# 1 is stored into r0,
# -1 is stored into r1.
cmn r0, r1
# performs r0 + r1, and set
# the flags, in this case,
# it is going to be the 
# Z, and C flags which are
# set to 1.


# Test for equivalence
mov r0, #1
mov r1, #1
# place 1 in both r0, and
# r1.
teq r0, r1
# teq performs r0 ^ r1, this will 
# result into all bits being set 
# to 0, since r0 and r1 are equal,
# as such the Z flag is set to 1.


# Test Bits
ldr    r0, =0x81234567
mov    r1, #0x80000000
tst r0, r1
# Performs r0 & r1, all
# bits will be set to zero,
# besides the most significant 
# bit, which will be set to 
# 1, as such the Negative flag,
# is set to 1.
# In other words, in this case,
# we are testing, if r0 is negative.

Branch operations

Branching is a way to construct if else, and for and while loop statements , and other form of control statements, found in a higher level language , so branching can be seen as a way for code organization.

DescriptionDoesOperationFlags
Branch immediate.PC = label
The program counter register, has a name of both PC, and R15.
B{cond} label
Branch with link immediate. LR = PC + 1
PC = label
The link register, has a name of both LR, and r14.
First the current program counter, plus one ,is stored into the link register, and after that branching is performed.
Once done executing code in the branched to block, executing from after where branching has occurred, can optionally be resumed, by branching using the link register.
BL{cond} label
Branch indirect register.PC = RmBX{cond} Rm
Branch indirect with link register.LR = PC + 1
PC = Rm
BLX{cond} Rm

The following are examples, of implementing for, do while, and if else statements, in assembly.

/* Example of implementing 
   in assembly of:
for( int i = 0; i <  5; i++ )
    // some statements 
*/
mov r0, #0
mov r1, #5
for_loop: 
    cmp r0, r1
    beq finish
    # some statements
    add r0, r0, #1
    b for_loop
finish:
    # some statements



/* Example of implementing 
   in assembly of:
do{ } while( x >= 0 )
*/
mov r0, #5
do_while:
    #some statements to execute
    subs r0, r0, #1
    bPL do_while



/* Example of implementing 
   in assembly of:
if else
*/
cmp r1, r2 
bNE else
    # some statements to execute, 
    # if r1 and r2 are equal.
    b after_if_else
else:
    # some statements to execute, 
    # if r1 and r2 are not equal.
after_if_else:
# Instructions in the if
# else bock, have executed,
# continue the sequential flow
# of the program. 

The following is an example, of how to use the bl, and bx instructions.

str lr, [sp , #-4]!
# Decrement the stack pointer
# by 4, and push the link register
# content, where now the stack pointer,
# points into memory.
bl do_something
# branching is done using the
# bl instruction. First the link
# register will be assigned the
# value of PC + 1, so it is 
# going to point to the following
# add instruction.
# After that, braching to the
# do_something block is performed.
add r0, r0, r1
# r0 = r0 + r1
ldr    lr, [sp], #4
# load the memory content pointed 
# by the SP register, into the link 
# register, and after that increment 
# the stack pointer by 4.
b done
# branch to the done section.

do_something:
    mov r0, #0
    # move 0 into r0.
    mov r1, #1
    # move 1 into r1.
    bx lr
    # branch back to where the
    # program was executing, since
    # the link register, contains the 
    # value of the program counter, 
    # before branching, plus one. 

done: 

Shift and rotate operations

DescriptionDoesOperationFlags
Arithmetic shift right.Rd = Rm >> Rs|#n.
Shift Rm to the right, by Rs or #n, and store the result into Rd.
The bits inserted to the left, replacing the bits removed from the right, will have the same value, as the most significant bit, before shifting, so if the most significant bit before shifting was 1, the newly inserted bits are 1, otherwise they are 0. This kind of shifting is called sign extension.
asr{S}{cond} Rd, Rm, <Rs|#n>N,Z,C
Logical shift right.Rd = Rm >> Rs|#n.
Shift Rm to the right, by Rs or #n, and store the result into Rd.
The bits inserted to the left, replacing the bits removed from the right, are set to 0, in other words, shifting is done by zero fill.
lsr{S}{cond} Rd, Rm, <Rs|#n>N,Z,C
Logical shift left.Rd = Rm << Rs|#n.
Shift Rm to the left, by Rs or #n, and store the result into Rd.
Since shifting is done to the left, new bits are inserted to the right, and their value is set to 0.
lsl{S}{cond} Rd, Rm, <Rs|#n>N,Z,C
Rotate right.Rm is rotated right, by Rs, or #n, and the result is stored into Rd.
Rotation right means, that the bits are moved rightward, so as if being dropped from the right, and inserted in the left.
ror{S}{cond} Rd, Rm, <Rs|#n>N,Z,C
Rotate right one bit, with extend.Rm is shifted to the right by 1 bit, the new most significant bit, is assigned the value of the carry flag, and the result is stored into Rd.
if the rrx instruction is suffixed by s, as to set the condition flags, the new value of the carry flag, will be the value of the previous least significant bit, which is bit 0.
rrx{S}{cond} Rd, RmN,Z,C
# Arithmetic shift right.
# Shifting is performed using sign 
# extension, which means bits inserted 
# to the left, are the same as the old 
# most significant bit, before shifting. 
# In other words, if the old Most significant 
# bit was 0, then the newly inserted 
# left bits are set to 0, and if the old 
# most significant bit was 1, then the newly 
# left inserted bits are set to 1.

asr r0, r1, r2
# r0 = r1 >> r2
asr r0, r1, #1
# r0 = r1 >> 1



# Logical shift right.
lsr r0, r1, r2
# r0 = r1 >> r2
# Shifting is done to the 
# right by zero fill, so 
# the bits inserted to the
# left are set to 0.


# Logical shift left.
lsl r0, r1, r2
# r0 = r1 << r2
# bits inserted to the right
# to replace the left shifted
# bits, are set to 0.



ror r0, r1, r2
# r1 is rotated by the value
# stored in r2. So bits are 
# removed from the right, and 
# inserted into the left, and
# the result is stored in r0.


rrx r0, r1
# r1 is shifted to the
# right by 1 bit, the new 
# most significant bit, will 
# have the value of the carry
# flag, and the result is 
# stored into r0. 

System calls

An arm processor can operate in different modes , each of which, has its own purpose. A user program, operates in user mode, and if it needs to perform some privileged tasks, such as creating a directory, or a file, or allocating memory, it must perform a software interrupt, so that the operating system, performs such a task.

The instruction for performing software interrupt, was previously named swi, but now it is named svc, where svc means supervisor call, supervisor as in the software managing the computer.

svc{cond} #imm

#imm is a number, which was previously used, to identify the task, that the operating system must perform, but nowadays, when doing linux system calls, it is always set to 0, and the task that you want the kernel to perform, is passed using the register r7.

As an example of performing the write system call, the write function, has the following signature.

ssize_t write( int fd, const void *buf, size_t nbytes );

fd is a file descriptor, so this is where things are going to be written, for example, for stdout, you pass in 1, buf is a pointer to a memory location, that you want to write to the file, and nbytes, is the number of bytes to be written, from the memory location.

    .global main
main:
    mov r7, #0x4 
    # mov 4 into the register r7,
    # 4 is the write system call.
    mov r0, #0x1 
    # mov 1, into r0,
    # where 1 means stdout
    ldr r1, =hey_world 
    # The address pointed by
    # the label hey_world, is 
    # moved into the register
    # r1.
    mov r2, #0x6 
    # 6 is the length of the 
    # buffer to write to   
    # stdout, and is stored into 
    # register r2.
    svc 0 
    # perform the software 
    # interrupt. 
    .data
hey_world: .ascii "hey :)"

# When the program is executed,
# the message hey :) is displayed.

As the previous example shows, arguments to the function performing a system call, are passed in registers r0 till r6, and a value which is returned by a system call function, is placed into register r0.

Arm registers

All registers in arm32 are 32 bits wide, and register names are case insensitive, so R0, and r0 refer to the same register.

Registers r0 till r12, are general purpose registers, as in they can be used to perform any sort of calculations, so for example addition, subtraction, or shifting …

Additionally, when making a function call, which is basically branching to another block of code, the convention is that registers r0 till r3, are used for passing arguments, and that a function returns its result, in the r0 register.

What this means, is that the function which is called, can use the registers r0 through r3 freely, without having to save their value before usage, as the responsibility to save them, is of the function that made the call, if it wishes to do so, as for registers r4 through r12, it is the responsibility of the called function, to save these registers before using them, and to restore their value upon returning, to the function that made the call.

System calls, and the r7 register, are described in the previous section.

Register r13 has also the name of sp, which means stack pointer. Each program has its own virtual memory , which is formed of a text region, text as in the program instructions, a data region, data as in initialized data that cannot be overwritten, and other regions, such as the stack, which can be used by a block of code, to save on registers, before using them and changing their values.

The stack is lifo, which means last in, is first out, and a stack can be either descending, or ascending.

The stack pointer, points to the top of the stack. it can either point to where data was last inserted, or it can point to where data will be inserted, as the previous image shows. Usually a stack is descending, and a stack pointer points to where data was last inserted.

The instruction, push, can be used to push, which is to save a list of registers into the stack, and the instruction pop, can be used to pop, which is to remove values from the stack, and place them into a register list.

push{cond} reglist
pop{cond} reglist

Register r14, has also the name of lr, which means link register. The link register is used by the bl, and blx instructions. These instructions perform branching, but before doing that, they save the address of the next instruction, that should have been executed if no branching has occurred, in the link register. The address stored in the link register, is used by the code that we have branched to, to continue execution of the program from where it left of, if it wishes.

Register r15, has also the name of pc, which means program counter.

logical_and:
    push { r4, r12 }
    # registers r4 till r12 are 
    # saved on the stack.
    add r4, r0, r1
    # r4 = r0 + r1 
    asr r0, r4, #2
    # r0 = r4 >> 2
    pop { r4, r12 }
    # restore registers r4 till r12 
    # data from the stack 
    bx lr
    # lr contains the address of 
    # the instruction to be executed,
    # after branching is done, so 
    # bx is used to branch back to this
    # address, and to continue execution,
    # of the original code, before branching.

    .global main
main:
    mov r0, #1
    mov r1, #0x80000000
    mov r4, #0x80000000
    bl logical_and
    orr r4, r4, r0