zettelkasten

Search IconIcon to open search
Dark ModeDark Mode

Assembly and Machine Programming

#lecture note based on 15-213 Introduction to Computer Systems

H2 Abstraction Level Overview

H3 A bit of history

… (whatever)

Intel x86! - Complex Instuction Set Computer (CISC) - lots of instructions

The AMD’s x86-64 - 64 bit extension.

H3 Assembly programmer abstraction level

  • Architecture aka ISA (instruction set architecture) - what one needs to understand to write assembly or machine code
    • e.g. x86, Itanium, x86-64, ARM, RISC V
  • Microarchitecture - how the architecture is implemented

CleanShot 2023-05-27 at 23.45.30@2x.jpg

  • PC aka Program Counter - address of next instruction. In x86-64 it’s RIP
  • Register - frequently used data on the CPU
  • Condition code - status from most recent arithmetic / logical operation
  • Memory - array of bytes (see Memory Layout)
    • Code
    • Data
    • Stack

H2 Registers

H3 Register datatypes

  • Integers - 1 / 2 (aka word) / 4 (aka double word) / 8 (aka quad word) bytes.
  • Floating point - 4 / 8 / 10 bytes
  • … maybe some other specialised types

H3 Common Registers

  • rip always point to next instruction

In x86-64… these are 8-byte registers

  • rax - always return register
  • rcx
  • rbx
  • rdx
  • rsi
  • rdi
  • rsp - stack pointer
  • rbp - base pointer

For the above, replace r with e to get lowest 4 bytes in the register as a 4-byte int.

  • r8
  • r9
  • r10
  • r11
  • r12
  • r13
  • r14
  • r15

For the above, append d to get lower 4 bytes

Some of these track back to IA32 registers, some even get there name from 16-bit registers.

full 4-bytelower 16-bitssecond lowest bytelowest byteorigin (mostly obsolete)
eaxaxahalaccumulate
ecxcxchclcounter
edxdxdhdldata
ebxbxbhblbase
esisisource index
edididestination index
espspstack pointer
ebpbpbase pointer

bold still relevant!

For %r8 to %r15, the naming for the lower positions follow this pattern

full32168
r8r8dr8wr8b

Basically, each register has sub register kind of thing inside:

Pasted image 20230604134202.png

H3 Register reference, constants, and addresses in assembly

  • immediate viz. constatn integer $0x42 $-342
  • register %rax %r15
  • memory (%rax) - deference %rax as a pointer to somewhere in memory

H3 Memory addressing format (Address Mode Expression)

Most general form: D (Rb, Ri S) which corresponds to Mem[Reg[Rb] + S * Reg[Ri] + D]

  • D is a constant displacement
  • Rb is base register
  • Ri is index register (except for %rsp)
  • S is scale (usually 1 | 2 | 4 | 8)

Things can be missing! Their base case value is whatever makes sense (the identity)

H2 Basic Assembly

H3 Instructions to know

Aside

  • Each instruction in x86-64 can have 1 to 15 bytes
  • Actual assembly programme may have funny .somethign stuff called directives
; Src := Source
; Dst := Destination

movq Src, Dst      ; copy 4 bytes from Src to Dst
leaq Src, Dst      ; compute Src as address mode expression and put in Dst
addq Src, Dst      ; Dst = Dst + Src
subq Src, Dst      ; Dst = Dst - Src
imulq Src, Dst     ; Dst = Dst * Src
salq Src, Dst      ; Dst = Dst << Src
sarq Src, Dst      ; Dst = Dst >> Src, arithmetic
shrq Src, Dst      ; Dst = Dst >> Src, logical
xorq Src, Dst      ; Dst = Dst ^ Src
andq               ; simile
orq                ; simile

incq Dst           ; Dst ++
decq Dst           ; Dst --
negq Dst           ; Dst = -Dst
notq Dst           ; Dst = ~Dst

ret                ; return

H3 Idea of word & Data length

It originated from 16-bit architecture where 16-bit is called a word.

  • word - 16-bit
  • double words - 32-bit
  • quad words - 64-bit

Notice that there are often suffix in assembly that indicates size

intel data typeasm suffixsize in byte
byteb1
wordw2
double wordl4
quad wordq8
single precisions4
double precisionl8

H3 Instruction for different data size

movz S, R  ; move with zero extension viz. R = ZeroExtended(S)
movzbw     ; byte to word
movzbl     ; byte to double word
movzwl     ; word to double word
movzbq     ; byte to quad word
movzwq     ; word to quad word

H3 Control Flow

Things are done with GOTO (which are jumps) conditioned on some flags.

Aside

Using Test ? Then : Else doesn’t get compiled to run only Then xor Else based on Test. Instead both Then and Else get computed. Use if else or GOTO allows computing single thing.

H4 Flags

Remember that “condition codes” in the CPU diagram? Those get updated by operations and can be used to condition jumps.

Condition codes and how they get updated by an operation instruction Src, Dst that corresponds to t = f(a, b).

Note the processor sets stuff without knowing signed or unsigned, programmer needs to choose the flag depending on context.

  • CF - Carry flag (for unsigned)
    • Set if carrry from most significant (think unsigned add)
  • ZF - Zero flag
    • Set if t == 0
  • SF - Sign flag (for signed)
    • Set if t < 0 viz. left-most bit is 1
  • OF - Overflow flag (for signed)
    • Set if signed overflow, viz `(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)

Exception

lea doesn’t set anything!!

Some instructions to set flags:

; calculates b - a and sets flags, (b - a result discarded)
; similar to `sub a, b`, but does't change b
cmp a, b

; computes b & a only to set flags (only SF and ZF) (also result discarded)
; similar to `and a, b`, but does't change b
test a, b

H4 Jumps

There are many jump instructions, each one depending on condition.

They tend to be in the form of jX, where X could be the following. 1

Name 	A.k.a.   	Jump if...                  	After CMP...
JMP  	         	Always                      	
JS   	         	Negative (SF=1)             	
JNS  	         	Not negative (SF=0)         	
JO   	         	Signed overflow (OF=1)      	
JNO  	         	No signed overflow (OF=0)   	
JE   	JZ       	Zero (ZF=1)                 	Equal
JNE  	JNZ      	Not zero (ZF=0)             	Not equal
JB   	JC, JNAE 	Unsigned overflow (CF=1)    	Unsigned below
JAE  	JNC, JNB 	No unsigned overflow (CF=0) 	Unsigned above or equal
JA   	JNBE     	CF=0 and ZF=0               	Unsigned above
JBE  	JNA      	CF=1 or ZF=1                	Unsigned below or equal
JL   	JNGE     	SF!= OF                     	Signed less
JGE  	JNL      	SF= OF                      	Signed greater or equal
JG   	JNLE     	ZF=0 and SF=OF              	Signed greater
JLE  	JNG      	ZF =1 or SF!= OF            	Signed less or equal

$$
\begin{array}{llll}
\hline \text { Name } & \text { A.k.a. } & \text { Jump if… } & \text { After CMP… } \
\hline \text { JMP } & & \text { Always } & \
\hline \text { JS } & & \text { Negative }(\mathrm{SF}=1) & \
\text { JNS } & & \text { Not negative }(\mathrm{SF}=0) & \
\text { JO } & & \text { Signed overflow }(\mathrm{OF}=1) & \
\text { JNO } & & \text { No signed overflow }(\mathrm{OF}=0) & \
\hline \text { JE } & \text { JZ } & \text { Zero }(\mathrm{ZF}=1) & \text { Equal } \
\text { JNE } & \text { JNZ } & \text { Not zero }(\mathrm{ZF}=0) & \text { Not equal } \
\text { JB } & \text { JC, JNAE } & \text { Unsigned overflow }(\mathrm{CF}=1) & \text { Unsigned below } \
\text { JAE } & \text { JNC, JNB } & \text { No unsigned overflow }(\mathrm{CF}=0) & \text { Unsigned above or equal } \
\hline \text { JA } & \text { JNBE } & \mathrm{CF}=0 \text { and } \mathrm{ZF}=0 & \text { Unsigned above } \
\text { JBE } & \text { JNA } & \mathrm{CF}=1 \text { or } \mathrm{ZF}=1 & \text { Unsigned below or equal } \
\hline \text { JL } & \text { JNGE } & \mathrm{SF} \neq \text { OF } & \text { Signed less } \
\text { JGE } & \text { JNL } & \mathrm{SF}=\text { OF } & \text { Signed greater or equal } \
\text { JG } & \text { JNLE } & \mathrm{ZF}=0 \text { and } \mathrm{SF}=\mathrm{OF} & \text { Signed greater } \
\text { JLE } & \text { JNG } & \text { ZF }=1 \text { or } \mathrm{SF} \neq \text { OF } & \text { Signed less or equal } \
\hline
\end{array}
$$

Example:

cmp    b, a
jle    0xsomewhere    ; if a ≤ b, jump to 0xsomewhere

conditional expression in C

Something like

val = Test(x) ? A(x) : B(x);

ends up computing both A and B, potentially leading to:

  • unsafe behaviour
  • bad performance
  • side effect

Aside: do while loop in C

We can rewrite

do {
	Body
} while (Condition);

into

loop:
	Body
	if (Condition) goto loop

H4 Conditional Set

There’s also a setX instruction with the same options of X as jump. A set instruction sets the lowest byte of a destination to 0 or 1 based on condition.

See the table for jump for available suffixes.

Example:

cmp    b, a
sle    %al         ; set %al (lowest byte of %rax) to 1
movzbl %al, %eax   ; make rest of %rax 0

H4 Switch Statements

Compiler can do different things depending on what the cases are. Closeby cases (like case 1 ... case 2 ... case 3) may get a jump table.

A jump table is like an array of targets (8 byte pointers) that point to different code blocks for the different cases.

Assembly examples:

; direct jump
jmp .Target

; indirect jump by table starting at .Table
jmp *.Table(, %rdi, 8)

H2 Machine Procedure

This is when we have some sequence of instructions that can get called multiple times (essentially some function)

There are a few things we need to do to make this work

  • Passing control - Be able to go to the beginning of the procedure and go back where we were after return
  • Passing data - pass arguments and get back return value
  • Memory management - allocate when running procedure, deallocate when returning

These mechanisms are implemented using instructions. The exact implementation depends on designers. The design would be the Application Binary Interface (ABI)

H3 The x86-64 Stack

It looks like this ()

Pasted image 20230604134847.png

To push onto the stack, we can run some push operation, which is equivalent to decrementing the stack pointer and writing something there.

pushq %rax

; does the same thing as

subq $8, %rsp
movq %rax, (%rsp)

Pasted image 20230604134911.png

For pop, we do the opposite: moving the value on top of the stack and putting it in some register.

Pasted image 20230604134927.png

H3 Calling Procedure

Pushing and popping from stack enables procedure call control flow. When doing a call instruction, push the address of the next instruction, aka return address, onto the stack. When returning with the ret instruction, go popping the stack allows us to find out where to go back to.

H3 Passing Data

It’s been decided that the first 6 arguments go in registers rdi, rsi, rdx, rcx, r8, r9 in that order, and the rest go on the stack. The return register is always rax.

Pasted image 20230604134952.png

H3 Call Frames

Each call gets its frame. The frame is used for:

  • local variables
  • temporary space
  • argument for next frame

Pasted image 20230604135010.png

H4 Caller Saved vs Callee Saved Registers

  • caller-saved: caller needs to save before calling
    • rax, for return value
    • rdi, rsi, rdx, rcx, r8, r9 for arguments (in that order)
    • r10, r11
  • callee-saved: callee needs to restore before returning
    • rbx, r12, r13, r14, r15
    • rbp (may be use as frame pointer)
    • rsp special form, must be restored to point back at the stack top