CSE 4051: Project #4, Shared DNA

Due: Friday, 1 Feb 2008

The Task

Finding the commonality between DNA/protein sequences is all the rage these days. By analyzing the commonality it is possible to trace the evolution of the human race or that of the banana plant. Given two DNA/protein sequences the task is to find the percentage of the DNA shared by the two sequences.

We define precisely what we mean by "commonality." First, a sequence x=x1x2...xn over a finite set S is a list of n of elements from a set S. Another sequence y=y1y2...ym from S is a subsequence of x if it can be formed by deleting some of the elements (possibly none) from x without disturbing the relative order of the remaining elements. For example,

GACCC
is a subsequence of
GAACCACGCC
The percentage of commonality of two sequences x=x1x2...xn and y=y1y2...ym is defined to be c = 100 * (k / ((n+m)/2)) where k is the length of the longest subsequence of both x and y.

Input/Ouput

The input is several test cases. Each test case is two lines. Each line is a sequence of 1 to 20,000 characters in the range A-Z. For each test case the output is the percentage of commonality. The answer should be correct to the nearest percent (round up if the answer lies half way between two integers). Follow the format shown below.

An Example

For example, if the input is
ABCDE
ABCDE
ABCDE
ABZDE
ABCD
ABCXYZ
then the output is
Case #1: 100%
Case #2:  80%
Case #3:  60%

Turning it in

Turn in the Java source code for the program using the submission server. The project tag for this project is proj04. The name of the file you submit must be Shared.java. Acknowledgement of project submissions can be found on the WWW.

File to be submitted:

Control code:

Course=cse4051
Project=proj04


Ryan Stansifer <ryan@cs.fit.edu>
Last modified: Thu Jan 31 10:29:18 EST 2008