CSE3101 Syllabus
Machine and Assembly Language Programming

Aug. 22-Dec. 16, 2005. Tues and Thurs. 9:30-10:45 AM, Olin room 137 EC
cs.fit.edu/~mmahoney/cse3101

Course Description

The course covers assembly language programming using the Intel IA-32 instruction set (Windows/Linux PC) including FPU and MMX instructions and linking to C/C++ programs.

Instructor

Matt Mahoney, mmahoney@cs.fit.edu (or matmahoney@yahoo.com). Office hours are immediately after class.

Textbook and course materials

Book: Assembly Language for Intel-Based Computers, 4th ed., Kip R. Irvine, plus Internet sources on advanced instructions.

You will need access to a PC (Pentium II or higher) running any version of Windows. Recommended software (all free):

Course notes

Mailing list

Within 24 hours after the first class you should receive an email notifying you that you have been added to the fit-cse3101 mailing list. If you do not, then email me to request an invitation. You are responsible for all material posted to the list, such as homework assignments, test announcements, and additional course material. Once you are a member, you are encouraged to post questions (or answers) to the class at fit-cse3101@googlegroups.com. If you have a Google account you may also access the list at the members-only website groups.google.com/group/fit-cse3101.

Grading

50% homework and 50% exams. Final grade = (sum of homework)/2 + (sum of top 3 exams)/6
90-100 = A, 80-89 = B, 70-79 = C, 60-69 = D, 0-59 = F.

There will be 4 exams including the final. The lowest exam will be dropped. All other exams count equally. All exams are open book and open notes (laptops off). There will be no makeup exams for any reason.

There will be several programming assignments worth a total of 100 points. All programming assignments will be turned in by email by attaching ONE source code plain text file. (Do not send executables, Word files, etc). Programs must be received by midnight on the date due. Late assignments are penalized 20% per day including non-class days, weekends, and holidays. Grading is based 50% on whether the program assembles and runs correctly, and 50% based on clarity of code and documentation. (Is the code understandable? Is it clear how to use the program to someone not familiar with the assignment? Do you describe the big picture rather than the details?)

All programs must run on my computer. This is a 750 MHz Duron model 3 (Compaq 5440) with 256 MB memory running Windows Me (similar to Win 98). Programs may be written for MASM 6.15, NASM 0.9.2, AS 2.9.5 (included with DJGPP g++) or AS 2.15.90 (included with MINGW g++). Programs requiring linking must link with one of DJGPP gcc/gxx 2.95.2, MINGW gcc/g++ 3.4.1, Borland C/C++ BCC32 5.5.1, or Digital Mars C/C++ SC ver. 8.35n using the linkers with those compilers, or with any files included in your CDROM (kernel32.dll, irvine32.dll, smallwin.inc, etc., linked with LINK32). Specify the exact commands used to compile and link. Your code may use any IA-32, floating point, or MMX istructions. Do not use SSE (xmm registers on Pentium III), SSE2 (Willamette/Pentium 4), or Cyrix extensions. Protected mode (32-bit) programs are preferred, except for programs that require 16-bit real mode (BIOS, MSDOS, direct hardware access, boot code, .COM files, etc.)

In addition to collected assigments, I will suggest homework (via the mailing list) that won't be collected or graded. However material from these assignments may appear on your exams.

You are permitted to receive help from other students or outside sources, however you must cite your sources in your program comments and your work must still demonstrate that you are capable of writing the program yourself. Copying code or allowing your code to be copied by other students is not allowed. See the Florida Tech Computer Science Honor Code for details.

Assignment 1

Due Tues. Sept. 6, 2005 (10 pts). Write a program in assembler that prints "Hello, world" 20 times on separate lines, preferably using a loop. You may use any of the assemblers, linkers, or libraries described above or on your CDROM. Give exact compile/link instructions in your comments.

Assignment 2

Due Tues. Sept. 27, 2005 (30 pts). Write a program to compute pi. Your program should take one argument, n, in at least the range 1 to 50000 and print at least n digits of pi after the decimal point to standard output, beginning with "3.1" and ending with a newline ("\n"). For example:
  pi 10
  3.1415926535
Your program must be at least twice as fast as pi.c when compiled with gcc -O3 or bcc32 -O over a reasonable range of n.

To check your program, compare the output with that of Quick Pi 2.7 (which runs hundreds of times faster than my pi.c).

Your program should be written in assembler. If you write it in C and generate assembler using the -S switch, then I expect to see some significant optimizations from what your compiler produces, as well as comments fit for human consumption. In other words, don't just give me:
pi.asm generated by bcc32 -O -S pi.c
pi.s generated by gcc -O3 -S pi,c

There will be extra credit of 15, 10, and 5 points for the 3 fastest programs, measured by the number of correct digits produced in 10 seconds on my PC. Winners will be announced to the class.

There are a large number of algorithms for computing pi. Machin's arctangent is faster, and fairly easier to implement. Chudnovsky's or Ramanujan's algorithm are a little more complex, but may be faster for n up to a million and not too hard to implement. See Mathworld and here, and search Google. X. Gourdon and P. Sebah, Numbers, Constants and Computation has good advice for high speed algorithms such as FFT multiplication and binary splitting, used in the very fastest algorithms.

Assignment 3

Due Oct 18, 2005 (20 pts). Compute and print pi to 14 decimal places using 5 different algorithms on floating point numbers. Your program must be written in assembly language from scratch, NOT written in C/C++ and compiled with -S.
  1. Using FLDPI
  2. 4 * atan(1) using FPATAN
  3. Using the Spigot algorithm: pi=4; for (k=64; k>0; --k) pi=pi*k/(2*k+1)+2;
  4. Using Machin's algorithm: pi=16*atan(1/5)-4*atan(1/239). Do not use FPATAN. Use a series expansion, atan(x) = x - x3/3 + x5/5 - x7/7 +...
  5. Using an algorithm that converges quadratically or faster, such as Borwein's quartic, Salamin-Brent, Gauss-Legendre, AGM, etc.
The program will take no input or arguments. The output should look about like this:
  FLDPI           3.14159265358979
  4*atan(1)       3.14159265358979
  Spigot          3.14159265358979
  Machin          3.14159265358979
  (your choice)   3.14159265358979

Assignment 4

Due Dec. 8, 2005, midnight (40 pts). Write a set of image processing functions using MMX to compute pixel operations in parallel. An image is represented as an array of unsigned bytes, each representing one pixel, scanning left to right starting at the lower left corner. A value of 0 represents black and 255 represents white. An image is grayscale, or may be one of the three components of a color image (blue, green, or red).

These functions will be called from a program in C or C++ as shown below. Functions are not required to perform error checking on the parameters, but must follow C calling conventions (preserve ebx, esi, edi, ebp, esp, execute EMMS before returning). A stack frame (using ebp) is not required.

For all functions, if a pixel result is outside the range 0 to 255 then the result should be replaced with the closest legal value. The result should be written back to the image unless otherwise specified. You may assume that width is a positive multiple of 16 and height is a positive even number. You may assume that image starting addresses will be aligned on 16 byte boundaries.

  void brightness(unsigned char *image, int width, int height, int val);
This increases or decreases the brightness of an image. The image consists of width*height pixels. val is a number in the range -255 to 255 which is to be added to each pixel.
  void contrast(unsigned char *image, int width, int height, int val);
This changes the contrast of an image by multiplying each byte by val/64, i.e. pixel = (pixel*val)/64; val will be in the range 0 to 255.
  void average(unsigned char *image, unsigned char *image2, int width, int height);
Replace image with the average of image and image2, rounding up, e.g. image[i] = (image[i] + image2[i] + 1) / 2. Both images have the same dimensions.
  void grow(unsigned char *image, unsigned char *newimage, int width, int height);
Double the size of image and put the result in newimage. The new image should have dimensions of width*2 by height*2. You may assume newimage points to an array of width*height*4 bytes. Expand the image by copying each pixel 4 times (twice horizontaly and twice vertically).
  void shrink(unsigned char *image, unsigned char *newimage, int width, int height);
This is the opposite of grow(). Shrink the image by sampling every other pixel horizontally and vertically. The result should be placed in newimage, which has dimensions width/2 by height/2 and points to an array of width*height/4 bytes. The sampled pixels should have even numbered indexes both horizontally and vertically, i.e. should sample the lower left corner of each 2 by 2 block.
  void blur(unsigned char *image, int width, int height);
blur() is an antialiasing filter normally applied before shrink(). The image is divided into 2 by 2 blocks. The 4 pixels in each block are replaced with the average of those 4 pixels (rounding up or down as you choose).
  void invert(unsigned char *image, int width, int height);
Produces a "negative" image by subtracting each pixel from 255 (i.e swap black with white). Two invert() operations will restore the original image.
  void edge(unsigned char *image, int width, int height);
If a pixel is brighter than the one above it, replace it with a white pixel (255). If it is darker or equal, replace it with black (0). Pixels on the top row are set to black.
  int atleast(unsigned char *image, int width, int height, int val);
Returns the number of pixels with values above or equal to val (0 to 255). The image is not modified.
  void clear(unsigned char *image, int width, int height, int val);
Sets all pixels to val (0 to 255).

Turn in the 10 functions as a MASM, NASM, or GAS file. I will link it to my own test code in C/C++. Extended MMX instructions such as PMULHUW, MOVNTQ, PSHUFW, etc. are allowed, but not SSE, SSE2 or CYRIX instructions (they won't run on my PC) or 3DNOW! (won't run on Intel processors).

There will be extra credit of 15, 10, and 5 points for the 3 fastest programs. The programs will be tested on 512 by 512 or larger input images, measured by the sum of the times needed to execute all of the functions an equal number of times on a 750 MHz Duron (timed with RDTSC). Programs written for olin (x86-64) are allowed, and may use SSE2 rather than MMX (16 pixels at a time), but are not eligible for extra credit.

Your submission should not include test code, but of course you will still need to write some. Below is a solution to the Fall 2004 project, which was to read a .bmp image file, perform image processing operations in MMX (different than the ones above), and save the result as a .bmp file. There are two programs. ip.c is entirely in C. ip2.c and ipa.asm is the same program with the image processing code in MMX called from C. They are equivalent, except that the second program is about 3 times faster. Typical combined speed of 7 different operations totals 250-300 clocks/pixel in C and 80-100 clocks/pixel in MMX (measured on a 750 MHz Duron) excluding load and save.

  ip.cpp Original C implementation (see comments).
  ip2.cpp and ipa.asm Equivalent program using MMX.

Your code should work with ip4.cpp. The program should produce a file ip4.bmp that looks like this.

Exam Solutions

Fall 2004 Exam 1
Fall 2004 Exam 2
Fall 2004 Exam 3
Fall 2004 Final Exam

Fall 2005 Exam 1
Fall 2005 Exam 2
Fall 2005 Exam 3
Fall 2005 Final Exam