Textbook: The Architecture of Computer Hardware and Systems Software by Irv Englander.
Numbers are usually written in base 10. The digits of a number written in base B have weights:
... B3 B2 B1 B0 . B-1 B-2 B-3 ...For example, in base 8:
123.458 = 1x82 + 2x81 + 3x80 + 4x8-1 + 5x8-2 = 1 x 64 + 2 x 8 + 3 x 1 + 4 x 1/8 + 5 x 1/64 = 64 + 16 + 3 + 4/8 + 5/64 = 83 37/64.Important bases in computers are base 2 (binary) and base 16 (hexadecimal or hex). In hex, A=10, B=11, C=12, D=13, E=14, F=15. Examples:
1AF16 (sometimes written 0x1AF or 1AFh) = 1 x 162 + 10 x 161 + 15 x 160 = 1 x 256 + 10 x 16 + 15 = 431. 1101011112 = 1 x 28 + 1 x 27 + 0 x 26 + 1 x 25 + 0 x 24 + 1 x 23 + 1 x 22 + 1 x 21 + 1 x 20 = 256 + 128 + 32 + 8 + 4 + 2 + 1 = 431Each hex digit converts to 4 binary digits (4 bits).
0001 1010 11112 = 1AF16 = 431.
8 bits = 1 byte. 1024 bytes = 1 kilobyte = 1 KB = 210 bytes. 1024 KB = 1 megabyte = 1 MB = 220 bytes = 1,048,576 bytes. 1024 MB = 1 gigabyte 1 GB = 230 bytes. 1024 GB = 1 terabyte 1 TB = 240 bytes.File, disk, and memory sizes are expressed in bytes. However, network and modem speeds are often expressed in bits per second with a lowercase "b", e.g. 56 Kb/s = 56 kilobits per second = 7 KB per second.
10 = linefeed (end of line) 13 = carriage return (preceeds linefeed in Windows, but not UNIX) 32 = space 48..57 = '0'..'9' 65..90 = 'A'..'Z' 97..122 = 'a'..'z' Complete ASCII table (0 to 127) Multi-byte Unicode extensions for other languages.
11012 = 1 x -23 + 1 x 22 + 0 x 21 + 1 x 20 = -8 + 4 + 0 + 1 = -3Common data types.
Size C/C++ Java Range 8 bits char byte -128 to 127 8 bits unsigned char - 0 to 255 16 bits short short -32768 to 32767 16 bits unsigned short - 0 to 65535 32 bits int int -2147483648 to 2147383647 32 bits unsigned int - 0 to 4294967295 64 bits long long long -9.223e18 to 9.223e18 64 bits unsigned long long - 0 to 1.844e19Arithmetic producing a result outside the legal range "overflows" by dropping the leading bits. For n bit types the computed result is congruent modulo 2n to the true result, e.g. for 32 bit integers, add or subtract 232 = 4294967296 until the result is within range:
1000000000 + 2000000000 == -1294967296 (3000000000 - 232) 0x20000 * 0x30000 == 0x0 (last 8 hex digits of 0x600000000)Integer division rounds toward zero.
8/3 == 2 -8/3 == -2
Size C/C++/Java type Sign Exponent Mantissa Range Precision 32 bits float 1 bit 8 bits 23 bits +-1038 6 significant digits 64 bits double 1 bit 11 bits 52 bits +-10308 14 significant digits Decimal float double (52 bits...) 0.0 0 00000000 00000000000000000000000 0 00000000000 00000000...0 1.0 0 01111111 00000000000000000000000 0 01111111111 00000000...0 2.0 0 10000000 00000000000000000000000 0 10000000000 00000000...0 3.0 0 10000000 10000000000000000000000 0 10000000000 10000000...0 4.0 0 10000001 00000000000000000000000 0 10000000001 00000000...0 5.0 0 10000001 01000000000000000000000 0 10000000001 01000000...0 -5.0 1 10000001 01000000000000000000000 1 10000000001 01000000...0 0.5 0 01111110 00000000000000000000000 0 01111111110 00000000...0 0.25 0 01111101 00000000000000000000000 0 01111111101 00000000...0 0.1 0 01111011 10011001100110011001101 0 01111111011 10011001...1Floating point arithmetic is not always exact.
10 * 0.1 - 1 != 0 (0.1 does not have an exact representation) sqrt(2) * sqrt(2) - 2 != 0 (sqrt(2) does not have an exact representation)
000 = halt (until user presses reset) 1xx = add number from location xx to calculator 2xx = subtract number from location xx 3xx = store calculator contents at location xx 5xx = load contents of location xx into calculator 6xx = enter xx into program counter (branch unconditional) 7xx = enter xx into program counter if calculator holds 0 (branch if 0) 8xx = enter xx into program counter if calculator holds 0 or a positive number 901 = read input, put into calculator 902 = output value in calculatorAfter all instructions except branches (6xx, 7xx, 8xx) and halt (000) the little man adds 1 to the program counter. Example program to input 2 numbers, add them, output the sum:
00 901 (input) 01 399 (store at 99) 02 901 (input) 03 199 (add from 99) 04 902 (output) 05 000 (halt)
Registers (in CPU) Typical number Purpose Integer 8-32 Arithmetic, comparison, logical Floating point 8-16 Arithmetic Flags 4-16 Comparison result Stack pointer 1 Call, return Instruction pointer 1 Fetch Segment 4-8 Process protection RAM Random Access Memory Typical size in bytes DRAM Dynamic RAM 108 Main memory SRAM Static RAM 106 Cache ROM Read-Only Memory 105 Boot code PROM Programmable (one time) EPROM Erasable PROM (in UV light) EEPROM (flash) electrically EPROM 103 Infrequently written data Disk 1011 FilesCPU talks to RAM, ROM, disk, and I/O on a bus:
Move = Arithmetic + - * / Comparison == != < > <= >= Logical & | ^ ~ (and, or, xor, not) Shift << >> (shift left, right), rotate Branch goto, if (...) goto Push, Pop put on stack, remove Call, Return push/pop instruction pointer Vector vector arithmetic (MMX, SSE on x86)
; a = b mov eax, [b] ; move from variable b to register eax mov [a], eax ; move from register eax to variable aThere is no if, while, for, just compare and conditional branch.
; if (a == b) a = a + 1; mov eax, [a] cmp eax, [b] ; compare b with a, save result in flags jne L1 ; jump if not equal add eax, 1 ; add 1 to copy of a mov [a], eax ; save result L1: ; label to jump toLogical and shift operators are faster than multiplication and division, but only work for powers of 2.
x << 3; // x * 8 (double x 3 times) x >> 3; // x / 8 (halve x 3 times, rounding down) x & 7 // x % 8 (AND x with 000001112)The stack is used both for passing parameters to functions and saving the return address between call and return. On the x86 the stack pointer is called esp. The stack grows downward by 4 bytes for each push.
; x = sum(1, 2); push 2 ; by convention, push right to left push 1 call sum ; push instruction pointer and jump to sum: mov [x], eax ; result is returned in eax add esp, 8 ; the caller discards the arguments | | +-------+ Picture of stack in sum() | b = 2 | +-------+ esp+8 | a = 1 | +-------+ esp+4 | eip | <-- address of the instruction following "call" +-------+ <--- esp ; int sum(int a, int b) {return a+b;} sum: mov eax, [esp+4] ; a is 4 bytes above where esp points add eax, [esp+8] ; b ret ; pop instruction pointer (eip)The Pentium MMX (1997) introduced 8 64-bit registers that could operate on packed 8, 16, or 32 bit integers 64 bits at a time. The Pentium 4 (2000) added 8 128-bit registers that could operate on 4 floats or 2 doubles at a time.
; // add two 4-element vectors of 16-bit integers using MMX ; short a[4], b[4]; ; for (i=0; i<4; ++i) ; a[i] = a[i] + b[i] movq mm0, [a] ; load a[0..3] into MMX register mm0 movq mm1, [b] ; load b[0..3] paddw mm0, mm1 ; do 4 additions at once, result in mm0 movq [a], mm0 ; save the result
Size Replacement Line/page Total Speed Algorithm --------- ----- ----- ----------- +-----------+ | CPU | 4-16B 102B 200 ps Renaming (hardware) | Registers | +-----------+ | +-------------+ | L1 Cache | 32B 104B 1 ns 4 way LRU (hardware) +-------------+ | +------------------+ | L2 Cache (lines) | 32-64B 106B 5 ns 4 way LRU (hardware) +------------------+ | +------------------------+ | RAM (pages) | 4KB 109B 50 ns Full LRU (OS) +------------------------+ | +------------------------------+ | Disk (sectors) | 1K-4KB 1011B 10 ms N/A +------------------------------+Sequential access is faster than random. Example: clearing a 2-D array:
// faster // slower (cache misses, disk thrashing) int a[N][N]; int a[N][N]; for (int i=0; i<N; ++i) for (int i=0; i<N; ++i) for (int j=0; j<N; ++j) for (int j=0; j<N; ++j) a[i][j]=0; a[j][i]=0;
// Compute y = a*b*c*d // Slow (multiplication has a long latency but is pipelined) y = a*b; y = y*c; // must wait for a*b to finish y = y*d; // must wait for y*c to finish // Faster, allows parallelism tmp1 = a*b; tmp2 = c*d; // computes in parallel with a*b y = tmp1 * tmp2; // must wait for a*b and c*d to finishAvoid unpredictable branches. In the simplest model, branches are predicted to go the same way as last time. Newer processors use smarter models.
// search for x in array for (i=0; i<N; ++i) // OK, loops branch the same way except the last iteration if (a[i] == x) // OK, false every time but once break; // OK, unconditional branch is always predicted // compute maximum int max(int a, int b) { if (a > b) // unpredictable branch is slow return a; else return b; } // faster, avoids a comparison int max(int a, int b) { int cmp = (a - b) >> 31; // all 1 bits or all 0 bits return (~cmp & a) | (cmp & b); }
// Example: copying a file using UNIX system calls (no error checks) int in = open("in.txt", O_RDONLY); // file descriptors int out = open("out.txt", O_WRONLY); char buf[BUF_SIZE]; // e.g. 4K bytes while (true) { int bytesRead = read(in, buf, BUF_SIZE); if (bytesRead > 0) write(out, buf, bytesRead); else break; // end of input file }
A multitasking operating system allows many processes (jobs, programs) to run apparently in parallel by giving a few milliseconds of CPU time to each one, round robin. A process waiting for input gives up the CPU immediately.
Input Requested +-------------------------+ | Timeout | | +------------+ V V | +---------+ +-------+ +---------+ | Blocked |-->| Ready |-->| Running |--> Exit +---------+ +-------+ +---------+ Input Ready
Context Switch +-------------------------------------------+ | | | ===+=======+=======+=======+=========+ | +--> | Ready | Ready | Ready | Running |-->+ ===+=======+=======+=======+=========+
Each process is assigned one or more memory segments. Reading, writing, or executing outside these segments cause a run time error (segmentation fault).
The operating system runs in privileged mode, allowing it to: