Virtual Memory

lecture

#lecture note based on 15-213 Introduction to Computer Systems

Notice: lots of processes can run at the same time, yet their memory addresses don’t overlap…

Goal: many programmes run in RAM, each of them have isolated stack, heap, and stuff

Notice: if we fork a process and and make both processes print heap/stack/data/text address, they print the same addresses, but one process can write without changing data on the other process…

Operating system somehow makes sure they aren’t using the same memory!

H2 Virtual vs Physical Memory

Address in a programme are usually not physical addresses.

Simple system: cpu request with physical address
Virtual addressing system: CPU sends virtual memory address, MMU translates that to physical address before sending request to main memory. So:
- all processes can just use virtual memory addresses
- all processes can think they have 64 bits of addressing space
- operating system decides what physical address correspond with what virtual address
- same virtual address in different processes can map to different physical address
- … or share the same physical address (if read-only, library code, etc.)
- different processes cannot overwrite unshared physical memory other processes are using
- launching new process doesn’t require finding different address for heap/stack/etc
- simpler, more efficient memory management
- simplifies linking - addresses for different things in a programme are usually similar
- simplifies loading - copying virtual pages, be lazy in terms of copying actual physical pages
- can act as a cache as some data mapped to DRAM, some to disk

H2 Virtual Memory Implementation

H3 Address spaces

Contiguous non-negative integer
- Virtual address space $V$ with size $N = 2^n$, addresses ${0,1,\dots,N-1}$
- Physical address space $P$ with size $M = 2^m$ addresses ${0,1,\dots,M-1}$
Mapping function
- $MAP: V \to P \cup \emptyset$
- virtual addresses could get mapped to memory or disk

H3 Components

Memory management unit (MMU)
- Sits between CPU and cache (cache access still requires physical address)
Page table entry (PTE) - stores an individual mapping
- valid bit
- flag bits, like permissions
  - sup - supervisor
  - read
  - write
  - exec
- physical address / disk address to map to
Page table - array of PTEs
- Hardware specific, part of ISA, and updated by kernel.
- CR3 points to start of page table for current running process
Page - unit of data mapped by a single entry
- Page size $P = 2^p$, usually 4 KB, some use 4 MB
Translation Lookaside Buffer (TLB)
VA virtual address
- TLBI TLB index
- TLBT TLB tag
- VPO virtual page offset
- VPN virtual page num
PA physical address
- PAO physical address offset
- PPN Physical page number
Page table base register (PTBR)

H3 The page table

H4 Single level page table

A simple implementation, but requires lots of space, many of which are allocated but not used for useful mapping.

It can just be an array of PTEs

Pasted image 20230806175159.png

Problem: this takes can take sth like 512 GB of memory… won’t even fit.

H4 Multilevel page table

Idea: split up VPN, use different portion to look up in different level

Pasted image 20230806180644.png

This way, we can simply have NULL and eliminate all the subsequent levels.

Trouble: We can cache miss at every level, so maybe good caching of the mappings would be nice.

H3 Virtual memory like cache

Like a fully associative cache, but mapping function much more complicated.
Hardware doesn’t enforce replacement policy. Operating system needs to do this.
Thrashing - when really needing more physical memory, so computer keeps writing reading disk
Demand paging - waiting for page miss to copy data to DRAM

H3 Translation Lookaside Buffer (TLB)

To speed up lookups, make use of locality again!. We usually hit something in the cache, so no need to go through all these translations.

Technically, putting PTEs in L1 works, but:

other data may evict PTEs
L1 still has delay

Instead, using a dedicated TLB:

Hardware in the MMU
Caches mapping to save time
set-associative (sth like 500 cached mappings)
Misses are usually rarer than cache misses

H4 TLB accesses

Similar to cache access, mappings can be broken down into:

VPN address: | TLBT (tag) | TLBI (index) | VPO |
- Where TLBI is used to select set
Each line: | v | tag | PTE |

H3 Access flow

Pasted image 20230809190229.png

Pasted image 20230806175621.png

H2 Memory-mapped Files

Used to be that all pages of physical RAM is back by some page on disk
- Whatever in memory is cache of what’s on disk
Modern: maybe less necessary because memory is cheaper, but still true for mapping to swap and file on disk
- (Can explicitly request two processes to share memory)
- Forking process - copy-on-write sharing:
  - duplicate the page tables (so they share same physical memory)
  - mark everything read only
  - copy page upon write fault

zettelkasten