zettelkasten

Search IconIcon to open search
Dark ModeDark Mode

Code optimisation

#lecture note based on 15-213 Introduction to Computer Systems

H2 Overview

H6 Goals
  • minimise num of instructions
    • less instructions
    • less slow calculations (multiplication, divsion)
  • minimise memory wait time
    • what to go in register (note then debugger may not be able to find it)
    • memory access pattern
  • avoid branching
    • make it easier for CPU to guess where next instructions are
H6 Limits
  • Don’t change program behaviour
    • Can’t optimise away edge cases (crazzy ones as well!)
  • Don’t increase algorithm complexity
  • Usually only one function at a time
    • solution: merge functions into one may help
  • Compiler assumes run-time input
H6 Optimisation types
  • Local - inside basic block
    • code elimination…
  • Global - look at control flow graph
    • move code around
    • loop transformation

H2 Example optimisations

H6 Constant folding
  • 0xFF << 8 -> 0xFF00
  • strlen("Hello") -> 5
H6 Code elimination
  • Delete if (0) { printf("hi"); }
  • Delete first x in x = 23; x = 42;
  • Factoring
    • ex. v[i].x + v[i].y; -> elt = &v[i] then x = elt->x; y = elt->y and x + y
  • make loop invariant constant
    • ex. for (...i...) {...A[j*n]...} -> int nj = i * n; for (...i...) {...A[nj]...}
H6 Inlining

Expand a function into the caller to eliminate function call overhead, as well as allowing other potential local optimisation after inlining.

(Though this could make code larger, which may slow things down too)

H6 Cache optimisation

See Memory Hierarchy

H2 Optimisation obstacles

H6 Memory Aliasing

Sometimes compiler has to account for pointers pointing to same location.

Consider:

// sum rows of n x n matrix a and put results in n x 1 vector b
void sum_rows(int *a, int *b, int n) {
	int i, j;
	for (i = 0; i < n; i++) {
		b[i] = 0;
		for (j = 0; j < n; j++) {
			b[i] += a[i*n + j];
		}
	} 
}

Ideally, b[i] should not access memory every iteration, but things break if a and b overlap, so putting b[i] in register doesn’t update what’s in a.

Solutions:

  • Use local variable as accumulator
  • Use strict keyword to tell compiler something is the only pointer to same memory location

Ramifications:

  • Sometimes compiler can’t move function calls out of loop
    • ex. strlen called to determine number of iterations, but loop may modify the string, so one could end up calling strlen every iteration

WARN

By C standard, different type of pointer don’t point to same thing (except void, char) - so don’t different pointer to same type and alias! Compiler assumes this when making optimisation

H2 Modern CPU - hardware computation in parallel

  • Run many instructions at the same time
  • Do multiple branches at same time, also predict branch - “speculative execution” - hold results - invalidate in future if wrong branch

Compiler:

  • minimise number of branches
  • unroll loop
    • so there’s no branching
  • avoid indirect branches
    • function pointers, method overloading

Heuristics - backward taken, forward not taken (BTFNT)

  • Predict going into backward branches like loops
  • Predict not going into forward branches like if statements
  • About 95% accuracy!

H2 Scheduling

Rearrange instructions to make data available

E.g. Read stuff to registers first and work with them (again the stuff to read shouldn’t overlap in memory)

H2 Benchmark

Cycles per element (CPE)