Aug. 22-Dec. 16, 2005. Tues and Thurs. 9:30-10:45 AM, Olin room 137 EC
cs.fit.edu/~mmahoney/cse3101
Book: Assembly Language for Intel-Based Computers, 4th ed., Kip R. Irvine, plus Internet sources on advanced instructions.
You will need access to a PC (Pentium II or higher) running any version of Windows. Recommended software (all free):
Within 24 hours after the first class you should receive an email notifying you that you have been added to the fit-cse3101 mailing list. If you do not, then email me to request an invitation. You are responsible for all material posted to the list, such as homework assignments, test announcements, and additional course material. Once you are a member, you are encouraged to post questions (or answers) to the class at fit-cse3101@googlegroups.com. If you have a Google account you may also access the list at the members-only website groups.google.com/group/fit-cse3101.
50% homework and 50% exams.
Final grade = (sum of homework)/2 + (sum of top 3 exams)/6
90-100 = A, 80-89 = B, 70-79 = C, 60-69 = D, 0-59 = F.
There will be 4 exams including the final. The lowest exam will be dropped. All other exams count equally. All exams are open book and open notes (laptops off). There will be no makeup exams for any reason.
There will be several programming assignments worth a total of 100 points. All programming assignments will be turned in by email by attaching ONE source code plain text file. (Do not send executables, Word files, etc). Programs must be received by midnight on the date due. Late assignments are penalized 20% per day including non-class days, weekends, and holidays. Grading is based 50% on whether the program assembles and runs correctly, and 50% based on clarity of code and documentation. (Is the code understandable? Is it clear how to use the program to someone not familiar with the assignment? Do you describe the big picture rather than the details?)
All programs must run on my computer. This is a 750 MHz Duron model 3 (Compaq 5440) with 256 MB memory running Windows Me (similar to Win 98). Programs may be written for MASM 6.15, NASM 0.9.2, AS 2.9.5 (included with DJGPP g++) or AS 2.15.90 (included with MINGW g++). Programs requiring linking must link with one of DJGPP gcc/gxx 2.95.2, MINGW gcc/g++ 3.4.1, Borland C/C++ BCC32 5.5.1, or Digital Mars C/C++ SC ver. 8.35n using the linkers with those compilers, or with any files included in your CDROM (kernel32.dll, irvine32.dll, smallwin.inc, etc., linked with LINK32). Specify the exact commands used to compile and link. Your code may use any IA-32, floating point, or MMX istructions. Do not use SSE (xmm registers on Pentium III), SSE2 (Willamette/Pentium 4), or Cyrix extensions. Protected mode (32-bit) programs are preferred, except for programs that require 16-bit real mode (BIOS, MSDOS, direct hardware access, boot code, .COM files, etc.)
In addition to collected assigments, I will suggest homework (via the mailing list) that won't be collected or graded. However material from these assignments may appear on your exams.
You are permitted to receive help from other students or outside sources, however you must cite your sources in your program comments and your work must still demonstrate that you are capable of writing the program yourself. Copying code or allowing your code to be copied by other students is not allowed. See the Florida Tech Computer Science Honor Code for details.
pi 10 3.1415926535Your program must be at least twice as fast as pi.c when compiled with gcc -O3 or bcc32 -O over a reasonable range of n.
To check your program, compare the output with that of Quick Pi 2.7 (which runs hundreds of times faster than my pi.c).
Your program should be written in assembler. If you write it in
C and generate assembler using the -S switch, then I expect to see
some significant optimizations from what your compiler produces,
as well as comments fit for human consumption.
In other words, don't just give me:
pi.asm generated by bcc32 -O -S pi.c
pi.s generated by gcc -O3 -S pi,c
There will be extra credit of 15, 10, and 5 points for the 3 fastest programs, measured by the number of correct digits produced in 10 seconds on my PC. Winners will be announced to the class.
There are a large number of algorithms for computing pi. Machin's arctangent is faster, and fairly easier to implement. Chudnovsky's or Ramanujan's algorithm are a little more complex, but may be faster for n up to a million and not too hard to implement. See Mathworld and here, and search Google. X. Gourdon and P. Sebah, Numbers, Constants and Computation has good advice for high speed algorithms such as FFT multiplication and binary splitting, used in the very fastest algorithms.
FLDPI 3.14159265358979 4*atan(1) 3.14159265358979 Spigot 3.14159265358979 Machin 3.14159265358979 (your choice) 3.14159265358979
These functions will be called from a program in C or C++ as shown below. Functions are not required to perform error checking on the parameters, but must follow C calling conventions (preserve ebx, esi, edi, ebp, esp, execute EMMS before returning). A stack frame (using ebp) is not required.
For all functions, if a pixel result is outside the range 0 to 255 then the result should be replaced with the closest legal value. The result should be written back to the image unless otherwise specified. You may assume that width is a positive multiple of 16 and height is a positive even number. You may assume that image starting addresses will be aligned on 16 byte boundaries.
void brightness(unsigned char *image, int width, int height, int val);This increases or decreases the brightness of an image. The image consists of width*height pixels. val is a number in the range -255 to 255 which is to be added to each pixel.
void contrast(unsigned char *image, int width, int height, int val);This changes the contrast of an image by multiplying each byte by val/64, i.e. pixel = (pixel*val)/64; val will be in the range 0 to 255.
void average(unsigned char *image, unsigned char *image2, int width, int height);Replace image with the average of image and image2, rounding up, e.g. image[i] = (image[i] + image2[i] + 1) / 2. Both images have the same dimensions.
void grow(unsigned char *image, unsigned char *newimage, int width, int height);Double the size of image and put the result in newimage. The new image should have dimensions of width*2 by height*2. You may assume newimage points to an array of width*height*4 bytes. Expand the image by copying each pixel 4 times (twice horizontaly and twice vertically).
void shrink(unsigned char *image, unsigned char *newimage, int width, int height);This is the opposite of grow(). Shrink the image by sampling every other pixel horizontally and vertically. The result should be placed in newimage, which has dimensions width/2 by height/2 and points to an array of width*height/4 bytes. The sampled pixels should have even numbered indexes both horizontally and vertically, i.e. should sample the lower left corner of each 2 by 2 block.
void blur(unsigned char *image, int width, int height);blur() is an antialiasing filter normally applied before shrink(). The image is divided into 2 by 2 blocks. The 4 pixels in each block are replaced with the average of those 4 pixels (rounding up or down as you choose).
void invert(unsigned char *image, int width, int height);Produces a "negative" image by subtracting each pixel from 255 (i.e swap black with white). Two invert() operations will restore the original image.
void edge(unsigned char *image, int width, int height);If a pixel is brighter than the one above it, replace it with a white pixel (255). If it is darker or equal, replace it with black (0). Pixels on the top row are set to black.
int atleast(unsigned char *image, int width, int height, int val);Returns the number of pixels with values above or equal to val (0 to 255). The image is not modified.
void clear(unsigned char *image, int width, int height, int val);Sets all pixels to val (0 to 255).
Turn in the 10 functions as a MASM, NASM, or GAS file. I will link it to my own test code in C/C++. Extended MMX instructions such as PMULHUW, MOVNTQ, PSHUFW, etc. are allowed, but not SSE, SSE2 or CYRIX instructions (they won't run on my PC) or 3DNOW! (won't run on Intel processors).
There will be extra credit of 15, 10, and 5 points for the 3 fastest programs. The programs will be tested on 512 by 512 or larger input images, measured by the sum of the times needed to execute all of the functions an equal number of times on a 750 MHz Duron (timed with RDTSC). Programs written for olin (x86-64) are allowed, and may use SSE2 rather than MMX (16 pixels at a time), but are not eligible for extra credit.
Your submission should not include test code, but of course you will still need to write some. Below is a solution to the Fall 2004 project, which was to read a .bmp image file, perform image processing operations in MMX (different than the ones above), and save the result as a .bmp file. There are two programs. ip.c is entirely in C. ip2.c and ipa.asm is the same program with the image processing code in MMX called from C. They are equivalent, except that the second program is about 3 times faster. Typical combined speed of 7 different operations totals 250-300 clocks/pixel in C and 80-100 clocks/pixel in MMX (measured on a 750 MHz Duron) excluding load and save.
ip.cpp Original C implementation (see comments). ip2.cpp and ipa.asm Equivalent program using MMX.
Your code should work with ip4.cpp. The program should produce a file ip4.bmp that looks like this.
Fall 2005 Exam 1
Fall 2005 Exam 2
Fall 2005 Exam 3
Fall 2005 Final Exam