Code optimisation
#lecture note based on 15-213 Introduction to Computer Systems
H2 Overview
H6 Goals
- minimise num of instructions
- less instructions
- less slow calculations (multiplication, divsion)
- minimise memory wait time
- what to go in register (note then debugger may not be able to find it)
- memory access pattern
- avoid branching
- make it easier for CPU to guess where next instructions are
H6 Limits
- Don’t change program behaviour
- Can’t optimise away edge cases (crazzy ones as well!)
- Don’t increase algorithm complexity
- Usually only one function at a time
- solution: merge functions into one may help
- Compiler assumes run-time input
H6 Optimisation types
- Local - inside basic block
- code elimination…
- Global - look at control flow graph
- move code around
- loop transformation
H2 Example optimisations
H6 Constant folding
0xFF << 8
->0xFF00
strlen("Hello")
->5
H6 Code elimination
- Delete
if (0) { printf("hi"); }
- Delete first
x
inx = 23; x = 42;
- Factoring
- ex.
v[i].x + v[i].y;
->elt = &v[i]
thenx = elt->x; y = elt->y
andx + y
- ex.
- make loop invariant constant
- ex.
for (...i...) {...A[j*n]...}
->int nj = i * n; for (...i...) {...A[nj]...}
- ex.
H6 Inlining
Expand a function into the caller to eliminate function call overhead, as well as allowing other potential local optimisation after inlining.
(Though this could make code larger, which may slow things down too)
H6 Cache optimisation
See Memory Hierarchy
H2 Optimisation obstacles
H6 Memory Aliasing
Sometimes compiler has to account for pointers pointing to same location.
Consider:
// sum rows of n x n matrix a and put results in n x 1 vector b
void sum_rows(int *a, int *b, int n) {
int i, j;
for (i = 0; i < n; i++) {
b[i] = 0;
for (j = 0; j < n; j++) {
b[i] += a[i*n + j];
}
}
}
Ideally, b[i]
should not access memory every iteration, but things break if a
and b
overlap, so putting b[i]
in register doesn’t update what’s in a
.
Solutions:
- Use local variable as accumulator
- Use
strict
keyword to tell compiler something is the only pointer to same memory location
Ramifications:
- Sometimes compiler can’t move function calls out of loop
- ex.
strlen
called to determine number of iterations, but loop may modify the string, so one could end up callingstrlen
every iteration
- ex.
WARN
By C standard, different type of pointer don’t point to same thing (except void, char) - so don’t different pointer to same type and alias! Compiler assumes this when making optimisation
H2 Modern CPU - hardware computation in parallel
- Run many instructions at the same time
- Do multiple branches at same time, also predict branch - “speculative execution” - hold results - invalidate in future if wrong branch
Compiler:
- minimise number of branches
- unroll loop
- so there’s no branching
- avoid indirect branches
- function pointers, method overloading
Heuristics - backward taken, forward not taken (BTFNT)
- Predict going into backward branches like loops
- Predict not going into forward branches like if statements
- About 95% accuracy!
H2 Scheduling
Rearrange instructions to make data available
E.g. Read stuff to registers first and work with them (again the stuff to read shouldn’t overlap in memory)
H2 Benchmark
Cycles per element (CPE)