zettelkasten

Search IconIcon to open search
Dark ModeDark Mode

Linking

#lecture note based on 15-213 Introduction to Computer Systems

“The most useless useful lecture… Almost nothing in this lecture matter for the course… however this is the most useful lecture for you as a computer scientist”

This is when we want to compile multiple files together.

gcc: translators turn each file into object file, linker takes both object files to make executable object file.

The compile process, in which linking is the last step:

Pasted image 20230806131554.png

H2 Why?

  • Modularity
    • Breaking programs into smaller files
    • Header files: listing the signature of functions in the code, without needing to provide the implementations
    • #include <thing.h> simply copies thing.h into the source (equivalent to putting signature in .c source file, telling the compiler there’s this function somewhere and linker will figure out where)
  • Efficiency
    • Separate compilation of different modules - save time recompiling small portion of file, relink them
    • Concurrent compilation of different source files

H2 Linking methods

  • Static linking - executable includes library code that will be used
  • Dynamic linking - don’t include actual library code. Share some library across multiple executables. Also allows loading new version of lib without recompiling itself

H2 Linking process

H6 Linker’s jobs
  1. Resolve symbols
    • Look at symbol tables and find stuff
  2. Relocate code
    • Merge separate code and data
    • Change relative location in object files into absolute location in executable
    • Update references to point to right location

Assembler and linker

  • Assembler makes symbol table (name, size, location) - linker associates symbol reference when they are referenced.
  • Assembler put placeholders in assembly - linker puts in the actual memory address after rearranging pieces.
  • Linker expects a main function somewhere, it includes a start of program file that calls main, presumably assembled by the assembler somewhere from the source.

Object file types

  • .o relocatable object file - code from exactly one source .c file
  • a.out - code and data needed for direct execution
  • .so - special type of relocatable object, set up for dynamically linked at run time (called dll dynamic link library on Windows)
  • elf - linux Executable and Linkable Format. .ELF appear on first 8 bytes of the executable file

Linker looks at all the file and figure out what to bring to shared object .so file

H3 ELF format

  • ELF header (starting address 0)
  • Segment header table - page size, segment size, virtual address memory segments
  • .text - code
  • .rodata - jump table, string constants (read only data)
  • .data - initialised global vars (that are not 0)
  • .bss - “block started by symbol / better save space” - uninitialised global vars or global vars initialised to 0, doesn’t occupy space
  • .symtab - symbol table
    • procedure, static var names
    • section names + location
  • .rel.text - relocation info
    • Instruction addresses that needs to be changed, and instruction for making the change
  • .rel.data - relocation info
    • Similar to previous, but for the .data section
  • .debug - info for debugging
  • section header table - offsets + size of each section

H3 Symbol types

  • Global symbols - aka non-static global vars - defined and can be referenced by other modules
  • External symbols - symbols defined by other module
  • Local symbol - symbols only referenced within module (like those with static)

(Linker doesn’t see variables inside function)
(Linker doesn’t care type!)

H3 Name collisions?

static within function

static type blah = ...; within a function is going to be like a global but only accessible within the function. This goes in .bss or .data. (whereas non static ones go on the stack)

duplicate symbols

Linker:

  • strong symbol - declared and initialised (var with value or func with body)
  • weak - unitialised globals
  • even weaker - extern

Rules:

  1. Multiple strong symbol with same name not allowed
  2. Strong and weak -> use the strong
  3. Multiple weak -> pick arbitrary one

Good practices:

  • Avoid non static globals
  • Initialise to make things strong
  • Put type in header, make compiler check type.
  • Use extern when referencing external global
    • Treated as weak
    • Causes error if not found in some external file