Introduction to UNIX

Last updated Apr. 12, 2004 by Matt Mahoney, mmahoney@cs.fit.edu

UNIX is a multi-user, multi-tasking operating system. This introduction describes features common to most UNIX variants, including Solaris, OpenBSD, FreeBSD, and Linux.

Services

An operating system is a program that, at a bare minimum, allows a user to load programs into memory, execute them, and interrupt their execution. Modern operating systems such as Windows and UNIX go far beyond this. UNIX provides the following services:

UNIX Commands

Because UNIX is multi-user, you must supply a login name and password before you can use it. After logging in, you can create and edit files and directories and run programs by giving text-based commands. A shell interprets these commands and executes them on your behalf. Some common commands:

  ls -las                List files in the current directory.
  man vi                 Display help on the vi program (a text editor)
  vi prog.cpp            Create/edit the file prog.cpp
  g++ prog.cpp -o prog   Compile C++ prog.cpp to executable file prog
  ./prog                 Run prog
  ^C                     (hold down Ctrl and type C) kill prog
  logout                 Log out

Each user has a home directory, such as /usr/joe, and a current directory (named .) which starts in the home directory. Each / separates one level in the directory tree with root /. A file name not starting with / is relative to the current directory. Thus, prog is equivalent to /usr/joe/prog, which means the file prog in the directory joe in the directory usr in the root directory. Some commands:

  cd dir                 Change the current directory to dir
  cd ..                  Move up one level (e.g. /usr/joe to /usr)
  cd                     Move to your home directory
  cp file1 file2         Copy file1 to file2
  mv file1 file2         Rename or move file1 to file2
  rm file1               Remove file1

The input and output of any program or command may be redirected to a file or another program.

  ./prog > file1         Write output of prog to file1
  ls >> file1            Append output of ls command to file1
  sort < file1 > file2   Sort, reading from file1 and writing to file2
  ls | sort              Run ls and sort simultaneously with output of ls
                           piped to the input of sort

You can run more than one program at a time by running them in the background. The shell prompts for the next command without waiting for the first one to finish running.

  ./prog &               Run in background
  ps                     Show all running process IDs (e.g. ./prog 1234)
  kill 1234              Interrupt prog running in background

See www.emba.uvm.edu/CF/basic.html for a list of common UNIX commands.

System Programming

A C or C++ program can use the operating system services below. Most functions return -1 and set errno in case of error. Arguments are type int or pointer. For more information on a function, use man -s 2 function
  perror("msg");  // Print "msg: error" depending on errno
                                
  int fd = open("file", O_RDONLY);  // Open for reading
  fd = open("file", O_WRONLY | O_TRUNC | O_CREAT, 0644);
    // Open for writing, truncate, create if needed with permissions rw-r--r--
    // Also: O_RDWR = open in read/write mode
    //       O_APPEND = open for appending
    //       O_EXCL = fail if O_CREAT and file exists
    //       O_NONBLOCK or O_NDELAY = read may return 0 bytes without blocking

  n = read(fd, buf, n);
  n = write(fd, buf, n);
    // Read or write up to n bytes to/from char buf[n], return number
    // of bytes actually read or written or 0 at EOF.

  n = lseek(fd, n, SEEK_SET)
    // Position file pointer to n from start (SEEK_END = end, SEEK_CUR = current)

  close(fd);  // Close file
  unlink("file");       //  Delete file
  truncate("file", n);  // Change size to n
  ftruncate(fd, n);
  sync();  // Flush all buffers to disk (for shutdown)

  // Get file information
  struct stat s;      // in sys/stat.h
  stat("file", &s);
  lstat("file", &s);  // link
  fstat(fd, &s);
    // s.st_dev = device
    // s.st_ino = inode
    // s.st_mode = permissions
    // s.st_nlink = hard link count
    // s.st_uid = owner ID
    // s.st_gid = group ID
    // s.st_size = file size
    // s.st_atime = last read time (seconds since Jan. 1, 1970)
    // s.st_mtime = last write time
    // s.st_ctime = change time (e.g. change permissions)

  // Read directory
  struct dirent d;  // in sys/dirent.h
  fd = open("dir", O_RDONLY);  // dir is a directory
  while(getdents(fd, &d, sizeof(d)))  // d.d_nam is next file in dir
    lseek(fd, d.d_off, SEEK_SET);  // go to next file
    
  // Change permissions (only root can change onwer)
  chown("file", owner, group);  // Change owner, group, -1 = unchanged
  lchown("file", owner, group); // Change link owner, group
  fchown(fd, owner, group);
  chmod("file", mode);          // Change permissions, e.g. 0644 = rw-r--r--
  fchmod(fd, mode);

  // Redirect
  fd2 = dup(fd);  // copy fd to fd2
  dup2(fd, fd2);  // close fd2 and redirect it to fd
  dup2(fd, 0);    // redirect stdin to fd
  dup2(fd, 1);    // redirect stdout to fd
  dup2(fd, 2);    // redirect stderr to fd

  // I/O device specific control
  fcntl(fd, cmd, arg)
  ioctl(fd, cmd, arg)

  // Special files
  mkdir("dir");               // Make directory
  mknod("pipe", S_IFIFO, 0);  // Create named pipe

  // Create child process
  int pid = fork();   // In parent, pid = child.  In child, pid = 0.
  pid = getpid();     // get process ID
  ppid = getppid();   // get parent process ID
  exit(n);            // Exit with status n (0-255, normally 0 = success)
  pid = wait(&n);     // Wait for child to exit with status n>>8 or signal n&127

  // Replace current process
  execl("cmd", "arg0", "arg1", ... ,0);   // Execute cmd with args
  execlp("cmd", "arg0", "arg1", ... ,0);  // Execute cmd with args in PATH
  execv(argv[0], argv);    // Execute, argv={"cmd","arg1",...,0};
  execvp(argv[0], argv);   // Execute in PATH

  // Change current directory
  chdir("dir");

  // Lower priority by n
  nice(n);

  // Change process owner (if root)
  setuid(n);     // set real owner
  seteuid(n);    // effective owner (has permissions of user ID n)
  setgid(n);     // set group
  setegid(n);    // set effective group
  n = getuid();  // get the above
  n = geteuid();
  n = getgid();
  n = getegid();

  // Signals - interrupt a running process
  kill(pid, sig);  // Send signal sig (0-31) to process pid, where sig is:
    // SIGINT  = ^C
    // SIGKILL = cannot be caught
    // SIGALRM = alarm clock
    // SIGTERM = default signal for shell "kill" command
    // SIGCHLD = child has exited
    // SIGSTP  = suspend (^Z)
    // SIGCONT = continue after suspend (bg or fg command)
    // SIGSEGV = segmentation fault
    // and others... (man signal)
  alarm(n);  // Schedule SIGALRM in n seconds
  pause();   // Wait for signal

  // Install signal handler
  int handler(int n) {}  // to be called when signal is caught
  int (*oldHandler)(int) = signal(sig, handler);  // install, save old handler
  signal(sig, SIG_IGN);  // ignore sig
  signal(sig, SIG_DFL);  // restore default behavior

  // Process group
  setpgid(pid, pgid);  // set to pgid (to disconnect from terminal)
  setpgid(0, pgid);    // set for self
  getpgid(pid);        // get process group ID (pid = 0 means self)

  // Create pipe with fd[0] for reading and fd[1] for writing
  int fd[2];
  pipe(fd);

  // Internet client
  #include sys/types.h, sys/socket.h, netinet/in.h, arpa/inet.h, netdb.h
  int fd = socket(AF_INET, SOCK_STREAM, 0);  // TCP, SOCK_DGRAM = UDP
    sockaddr_in s;
    memset(&s, 0, sizeof(s)); // clear s
    s.sin_family = AF_INET;   // internet, or AF_UNIX for local
      hostent *h = gethostbyname("server.com");  // DNS lookup, NULL = fail
    s.sin_addr.s_addr = ((in_addr*)(h->h_addr))->s_addr;  // IP address
    s.sin_port = htons(80);   // 80=HTTP, 23=telnet, 25=SMTP, 21=FTP, 22=SSH
  connect(fd, (sockaddr*)&s, sizeof(s));  // -1 = fail
  // then read, write, close fd

  // Internet server
  int sfd = socket(AF_INET, SOCK_STREAM, 0);  // server file descriptor
    s.sin_addr.s_addr = htonl(INADDR_ANY);    // rest of s as above, ports < 1024 require root
  bind(sfd, (sockaddr*)&s, sizeof(s));  // bind socket to server port
  listen(sfd, 5);  // max client queue size is 5
    sockaddr_in c;  // client IP address and port
    int len = sizeof(c);
  int cfd = accept(sfd, (sockaddr*)&c, &len);  // wait for client
  // read, write, close cfd to client (usually in a child process)

  // Misc. network
  gethostname(name, len);  // get host name in char name[len]

  Example program to read a web page

UNIX Internals

Hardware support

Most modern processors have hardware features to support multitasking, protection, and virtual memory efficiently.

Process Management

Each entry in the process table (in kernel space) contains the following:
  Process ID
  Parent process ID
  Real and effective user and group IDs
  State
  Pending signals
  Code segment (executable code memory bounds)
  Data segment (static data)
  Stack segment (temporary data)
  User area
    Signal handler table
    Open file descriptors
    Recent CPU usage
    Hardware register contents (unless running)
  Page table

A process can be in one of 6 states.


                                    Suspended
                                   /        ^
                          SIGCONT /          \ SIGSTOP
                                 /            \
                                /  Allocated   \
              Initialize       V      CPU       \        Exit
       Idle --------------> Ready ----------> Running ----------> Zombie
                               ^                /
                                \              /
                    Event occurs \            / Wait for event
                                  \          /
                                   \        V
                                    Sleeping
Scheduling is round-robin with preemption by higher priority processes.
  Every second:
    For each process
      priority = (recent CPU usage)/constant + base priority + nice setting

  Every 1/10 second or when a process waits or is suspended
    Move highest priority process to running state

  Every clock tick (1/100 sec)
    Increment count
    If count = 4 then recompute priority of running process
In Linux, priority is computed as follows:
   Credits = credits/2 + priority
At each context switch, the processor is given to the process with the most credits. Both methods favor interactive processes over long running CPU bound processes.

File System

A disk is divided into cylinders, tracks, and sectors, which are numbered in the order in which they can be accessed most quickly sequentially (possibly interleaved). This array of blocks has the following structure:

  Boot block (executed at startup)
  Super block (bitmap of free blocks)
  Inodes (pointers to file blocks)
  User blocks (directory and file blocks)
A directory is a table of file names and inode pointers as read by getdents(). An inode is:
  Pointers to blocks 0-9 (typically 4KB each)
  Indirect pointer for files over 40KB
  Double indirect pointer for files over 4MB
  Permissions, owner, group, type, timestamps... as returned by stat()
An indirect pointer points to a 4K block of 1K pointers. A double indirect block points to a 4K block of 1K pointers to 1K pointers to support files up to 4GB.

Input/Output

Memory addresses of I/O devices (disk, printer, keyboard, mouse, monitor, network ports, serial ports) are accessible only in kernel mode. Access is only through system calls using kernel code running in kernel mode. Steps are:
  User process makes I/O request (read() or write()) to kernel.
  Kernel puts user process in waiting state and gives CPU to another process.
  Kernal requests I/O to hardware.
  I/O transfers data by DMA.
  I/O signals hareware interrupt.
  Kernal puts user process in ready state.

Virtual Memory

The MMU maps virtual addresses into real addresses in RAM or disk.
                             32 bit
            Logical          virtual    High            Physical
  User      address   +---+  address   20 bits +-----+  address
  Process ----------->| + |------------+------>| MMU |----------->
                      +---+            |       +-----+             RAM
                        ^              |
            Segment     |              +------------------------->
            Table   ----+                Low 12 bits
MMU page table indexed by high part of address
In hardware:
  RAM address
  Valid bit
  Referenced bit
  Modified bit
  Copy-on-write bit
In software:
  Disk address
  Age
  Free bit
  Ready-to-swap-out bit
  Reference count
MMU algorithm (in hardware)
  Read:
    If valid then
      Output mapped address
      Set referenced bit
    Else
      Page fault interrupt

  Write:
    If valid and not copy-on-write
      Output mapped address
      Set referenced bit
      Set modified bit
    Else
      Page fault interrupt
Page fault algorithm (MMU interrupt)
  Move current process to sleep state
  If page is not still in RAM
    Find page with free bit set
    Load page from disk
    Update page table address
  Set valid bit
  Clear free and ready-to-swap-out bits
The page daemon (kernel process) swaps out the least recently used pages:
  For each page
    If referenced bit is set
      age = 0
    Else
      age = age + 1
    If age > limit
      If no disk address or modified bit is set
        Set ready-to-swap-out
      Else
        Set free bit
  If number of free pages < limit
    For each page
      If ready-to-swap-out
        Copy to swap file
        Set free bit

Comparison to Windows

Unlike UNIX/Linux, Windows is proprietary (no published source code) and runs primarily on Intel PC's. History:
  Single user       Single user      Multi user
  Single tasking    Multitasking     Multitasking


     BASIC
       |
      DOS
       |
   Windows 3.1 -----------------------
                      \                \
                    Windows 95       Windows NT 
                       |                |
                    Windows 98          |
                       |                |
                    Windows Me       Windows 2000
                       |                |
                       \                |
                         ------------\  |
                                     Windows XP

Hardware Support

Same as UNIX.

Process Management


 Transition ---> Ready --------> Standby -------> Running ---> Terminated
                   ^        (front of queue)       /
                    \                             /
                     \       (preemption)        /
                      \ <---------------------- /
                       \                       /
                        <---- Waiting <--------
                             (for event)
  • Windows (and most UNIX versions) support threads (processes that share code and data, but not stack).
  • No parent/child relation after creation.

    Scheduling

  • Round robin with preemption
  • 32 priority levels (16 kernel, 16 user)
  • Waiting processes get higher priority depending on device (keyboard > disk)
  • Foreground window gets 3x time slice

    Environmental Subsystems

  • Directly supports back to Win 95/NT
  • MSDOS emulation
  • 16-bit Windows (3.1)
  • 32-bit Windows (on 64-bit XP)
  • POSIX (UNIX)

    File System (NTFS)

  • Hierarchical directory spanning disks, networks (symbolic links)
  • Transaction based I/O (circular log)
  • Registry restore
  • Change journal
  • RAID (stripe sets, parity, mirroring)

    Security

  • Access list per object (users but no groups)
  • Object = file, process, thread, semaphore, shared memory segment,
  • No path permissions (except POSIX)

    Memory Management

  • Segment permissions: read only, read-write, execute, none (guard), copy on write.
  • MMU - Disk cache integration (like UNIX)
  • Predictive page prefetching (files and memory)
  • LFU (least frequently used) page replacement, not LRU
  • 1K x 1K x 4K pages per process (4 GB virtual memory)

    Networking

  • Protocols: TCP/IP, PPTP (VPN), SMB, NetBIOS, NetBEUI, DLC, Appletalk
  • Hierarchical network domains (Kerberos): one way trust (like NFS), transitive, or cross link
  • Active directory (like NIS)

    System Programming

  • Windows API (C/C++ interface)
  • Winsock based on UNIX BSD socket model
  • Event handler model (vs. blocking system calls)
  • Direct interface to GUI (vs. X client/server)
  • IPC: Named pipes, mailslots, RPC, COM, DCOM