Data in Machine Programming

lecture

#lecture note based on 15-213 Introduction to Computer Systems

We’re interested in storing some structured data, such as struct or array, but memory is just a giant array of bytes without types…

H2 Arrays

In C, we can do int val[5]. Then, compiler can decide what to do.

Adding to array start pointer scaled by size of type! So that val[3] == *(val+3) regardless of type of *val.

Consider an int array [1,5,2,1,3] (char, long etc. array still work) at address x, then:

expression	type	value
`val[4]`	`int`	`3`
`*(val + 3)`	`int`	`1`
`val`	`int*`	`x`
`val + 1`	`int*`	`x + 4`
`&val[2]`	`int*`	`x + 8`

“02139 … MIT, whatever that is”
—Prof Railing

Array access is essentially adding some value to the pointer and reading the variable at that location. This is often done in assembly with something like (%rdi, %rsi, 4) where rdi is starting address, rsi is index, and 4 is size.

Note: sizeof() done at compile time. Compiler will hard code them in assembly
Syntax (see diagram)
- int H[3] is an array of ints
- int (*A3)[3] is a pointer to a int array
- int *p[3] is an array of int pointers

array syntax.png

H3 Multidimensional Arrays

There are many ways. In C, memory layout is Row-Major (others could be col-major, diagonal(..?) etc.).

Suppose R[R][C], then in memory:
A[0][0] A[0][1] ... A[0][C-1] A[1][0] ... A[R-1][C-1] viz. ...row1 ...row2 ...row3

H3 Multi-level arrays

We can have, e.g., an array of pointers pointing to the rows. This also allows variable length for the rows.

H2 Structs

Similar to array, compiler knows the size of each field is in the struct.

Fields are ordered in memory as they are declared in the C code.
Compiler decides overall size of struct and position/alignment of fields.

Example

struct S1 {
    char c;
    int i[2];
    long v;
}

Data alignment - compiler aligns data to multiples of sizeof(type). Unused space would be called padding. This is advised on x86 for efficiency, but some machine may require this. If $X$ has an alignment requirement of $K$, then its start address in memory should be $cK$ for some $c \in \mathbb{Z}$.

Motivations for alignment:

The stay in the same cache line
They stay in the same page in virtual memory

Primitive datatypes alignment requirements.

1 byte char etc.
2 byte short
4 byte int, float…
8 byte double, long…
There might be some 16 byte stuff in x86, ignore by now

Compiler also aligns complex types like struct and array.

The compiler uses the largest alignment requirement of any element in the struct. Given $K$ as the largest alignment requirement of things in a struct, then the struct must start at $c_{1}K$ and have size $c_{2}K$. Then the padded struct is what the compiler use to get size of struct. This lets us make array of structs easily.

Fact: struct arrangement from smallest to largest and largest to smallest always have same aligned size. Might not be true if unsorted.

struct S2 {
    long v;
    int i[2];
    char c;
}

In S2 arrangement, we get external padding at end. People usually prefer largest to smallest to push padding toward the end.

Pasted image 20230604135735.png

For nested struct - treat inner struct with its size and alignment requirement.

H2 Unions

Like struct, but one field at a time. A union is allocated the space of its largest element.

union U2 {
    char c;
    int i[2];
    long v;
}

Pasted image 20230604140944.png

The beginning of each type get aligned together

zettelkasten