zettelkasten

Search IconIcon to open search
Dark ModeDark Mode

System-level I/O

#lecture note based on 15-213 Introduction to Computer Systems

System level IO - much lower level than fopen, fputs, fclose, etc. Those are just higher level wrappers. Lower than that there are many many layers.

H2 Files

  • file … just a sequence of bytes
  • IO devices are represented as files
    • /dev/sda2 - disk partition
    • /dev/tty2 - terminal
    • /dev/null - discard write, read empty
  • Kernel data structures are (exposed as) files too
    • cat /proc/$$/status
    • ls -l /proc/$$/fd
    • ls –RC /sys/devices | less
  • Directories are files too
    • Array of entries that map filename to file
    • And also:
      • .
      • ..
  • File operations
    • open, close
    • read, write
    • file info (size (small amount, not format), type, last modification)
      • stat(), lstat(), fstat()
      • The types
        • regular file - store data, just bytes
        • directory
        • socket - for communication
        • symbolic link
        • named pipes
    • lseek() to change current file position
  • End of line indicators, from old typewriter terminal
    • Unix
      • 0x0A for line feed
    • DOS, Windows, 0x0D, 0x0A
      • 0x0D carriage return
      • 0x0A line feed
    • C library translates to \n

H3 File Types

  • Regular file - stores bytes
  • Directory - index related files
  • Socket - file for communicating with other machine
  • Named pipes
  • Symbolic links
  • Character and block devices

H3 File Metadata

Lots of metadata maintained by kernel, and different for different operating systems.

Getting metadata: call stat or fstat, they return these:

struct stat {
    dev_t st_dev;
    ino_t st_ino;
	mode_t st_mode;
	nlink_t st_nlink;
	uid_t st_uid;
	gid_t st_gid;
	dev_t st_rdev;
	off_t st_size;
	unsigned long st_blksize;
	unsigned long st_blocks;
	time_t st_atime;
    time_t st_mtime;
    time_t st_ctime;
};

H2 Unix IO

  • Open
    • call open(path, flags) and tell kernel what to do with file (read/write/etc)
      • O_RDONLY - read only
      • O_WRONLY - read only
      • O_RDWR - read write
    • or open(path, flags, mode) to open or create, in which flags must include O_CREAT and maybe some others, mode specifies access permission.
    • kernel returns an int that indicates which file
    • (children inherit open files!)
  • Close
    • Don’t close more than once (that leaks resource)
    • close(fd) takes an int
    • This is faillible! if say there’s previously a write error for the file trying to close
  • Read int nbytes = read(fd, buf, sizeof(buf);
    • Basically fill a buffer with bytes from the file, starting form current file position, up to some size, then update file position, return nbytes
    • Possible scenarios
      • nbytes < 0 if error
      • Empty file - read 0 byte - unchanged buffer
      • We could read less than size in situations like: (aka short count)
        • reaching EOF, reading from terminal, networking situation
  • Writing nbytes = write(fd, buf, sizeof(buf);
    • Copy bytes from memory to file, update file position, extend file if necessary, return num bytes written.
    • nbytes < 0 - error occured
    • Possible that nbytes < size, without error (aka short count)

Always check status of IO calls!
For disk short count typically don’t happen (except for reaching EOF)

Also, don’t use ASCII oriented IO, string functions etc. on binary files

H2 Standard I/O

H3 File streams

System IO calls are expensive…

Solution - buffered IO. Store for a while before writing to actual IO. This is how standard IO works in C.

  • Process begins with open files. The C library manages these as streams. These can point to same file
    • 0 for stdin
    • 1 for stdout
    • 2 for stderr
  • They get flushed when printing "\n" or when fflush(stdout). Buffering could lead to weird print ordering sometimes.

H3 Standard I/O vs Unix I/O

Usually, use Standard IO (easier, less flexible) rather than Unix IO (tricker, but more feature).

But don’t use Standard IO with network socket.

  • Unix I/O
    • good
      • very general
      • can access file metadata
      • async signal safe
    • bad
      • error prone
      • short counts tricky to deal with
      • need buffer to be efficient
  • Standard I/O
    • good
      • buffered
      • automatic short count handelling
    • bad
      • can’t access metadata
      • not async signal safe
      • not work well with sockets

H2 File descriptor table

Each process has table pointing to open files.
Each open file maintain file table, which stores file position, reference count, etc.

Children share open file, and share file position. (even with exec)

But if calling open twice, there are multiple file tables and each have independent position.

Kernel call to change file descriptor with dup2(src, dst)

Pasted image 20230808153657.png