Forensics: solving crimes with DNA
Objectives
- understand how DNA can be used for identifying people in solving crimes
- understand how DNA profiles can be matched
- understand how to calculate the joint probability of independent events and apply the calculation to estimate the probability of a matching DNA profile.
- practice loops, conditionals, and arrays
Prerequisites
- loops, conditionals, and arrays
- basic probability of a coin or die
Reading
Materials
Activities
- Each person is unique in DNA (except for identical twins).
- DNA samples can be collected at crime scenes to identify suspects and/or victims.
- About .1% of human DNA varies from person to person.
- Forensic analysis focuses on loci (locations) of the DNA that could vary among people.
- Values at those loci (DNA profile) are recorded for comparing DNA samples.
- Two DNA profiles from the same person have matching values at all loci.
- Ask if more or fewer loci are more accurate in identification. (More loci is more accurate [but more expensive]. Imagine only one locus is used and the locus has only two equally likely values, half of the population share the same value at one locus and have the same DNA profile)
- FBI uses 13 core loci.
-
- We do not want to wrongly accuse someone; how can we find out how likely another person has the same DNA profile?
- Ask them how many people are in the world. (about 7 billion)
- Ask them how low the probability needs to be so that a DNA profile is unique in the world. (at most one in 7 billion)
- Note that a very low probability does not mean impossible, just very unlikely.
-
- Let's review some basic probability.
- Joint probability of two independent events: P(A,B) = P(A) * P(B).
- Independent events mean knowing one event does not provide information about the other events.
- For example, getting two 1's from two dice is P(Die1=1, Die2=1) = P(Die1=1) * P(Die2=1) = 1/6 * 1/6 = 1/36. Knowing Die1=1 does not provide information about Die2=1 and vice versa.
- For visualization, draw a 6x6 table and label the rows and columns 1 thru 6 for values of the two dice. The table has 36 entries--all possible combinations of two dice and they are all equally likely with probability 1/36. Only 1 out of 36 combinations has two 1's.
- Ask how to calculate P(Die1=1, Die2=5, Die3=4). ((1/6)^3 = 1/216)
- Ask how to calculate P(Die1=even, Die2=6). (1/2*1/6=1/12; again we can use the 6x6 table to see 3 out of 36 combinations.)
- One way to estimate the probability of a person having a particular DNA profile is to calculate P(Locus1=value1, Locus2=value2, ...)
- Ask how to estimate the probability, assuming values at all the loci are independent. (we can use the joint probability of independent events: P(Locus1=value1, Locus2=value2, ...) = P(Locus1=value1) * P(Locus2=value2) * ...)
- To estimate P(Locus1=value1), ... , we can draw a random sample of size N from the population and find out how many people out of N have value1 at Locus1, ...
-
- Ask them to implement the two undefined methods (see Materials above).
- Verify the output is correct (compare with the output of sample solution).
-
Assessment
- If the values at the first locus of two DNA profiles do not match, do we need to check the other loci? (No, whenever there is a mismatch we can stop and say there is no match; two DNA profiles of the same person have matching values at all loci)
- What is P(Die1=even, Die2=odd)? (1/2*1/2 = 1/4)
- What is P(Die1=even, Die2 <= 2)? (1/2*1/3 = 1/6)
- What is P(Die1 > 2, Die2 <= 2)? (2/3*1/3 = 2/9)
- This question requires logarithm. If they do not know logarithm, ask them just to set up the equation, without solving it. Consider each DNA locus has two equally likely values, what is the minimum number of loci so that each DNA profile is likely to be unique in the world population of 7 billion? (let m be the minimum number of loci, (1/2)^n < 1/(7 x 10^9), m > 32.7 or m = 33 since m is a whole number)
-