Average Case Time Complexity

by William Shoaff with lots of help


Contents

You can download a postscript version of this file (which is prettier) at

http://www.cs.fit.edu/wds/classes/algorithms/Average/average.ps

Average Case Time Complexity

One method to compute an algorithm's average case time complexity is to partition the sample space of problem instances into disjoint sets where solving any instance in the set takes the same number of steps.

Let's pretend the number of steps goes from j = 0 to j = m, that is, some instances may require no work and the hardest instances require j = m steps. Let's also denote by Pj the probability that an instance takes j steps. Then we can define the average case time complexity Ta(n) of a problem with input size n as the weighted sum:

$\displaystyle \sum_{j=0}^{m}$j . Pj.

To see how this formula applies, consider the pattern matching problem (PMP) where the pattern p, of length m, is from a k symbol alphabet. There are km possible patterns and p is one of them. It helps to keep an example in mind, so let {abc} be the alphabet (k=3) and let p = abac be one of the 34 = 81 possible patterns.

Now suppose we are trying to match p[0..m - 1] against text t[i..i + m - 1]. There 1 unsuccessful compare ( p[0] $ \neq$ t[i]), or 1 successful compare followed by an unsuccessful compare ( p[0] = t[i] and p[1] $ \neq$ t[i + 1]), or 2 successful compares followed by an unsuccessful compare ( p[0] = t[i] and p[1] = t[i + 1] and p[2] $ \neq$ t[i + 2]), and so on up to m successful compares or m - 1 successful ones and 1 unsuccessful compare. We want to compute the probabilities for each of these cases.

The first compare is unsuccessful k - 1 out of k times. Thus the probability of exactly one (unsuccessful) compare is P1 = (k - 1)/k. For our example, P1 = (3 - 1)/3 = 2/3, which corresponds to the 54 out of 81 four letter patterns that start with b or c.

The first compare is successful 1 out of k times and the second compare is unsuccessful k - 1 out of k times. Thus the probability of exactly two compares (one successful followed by an unsuccessful one) is P2 = (k - 1)/k2. For our example, P2 = 2/9, which corresponds to the 18 out of 81 four letter patterns that start as aa or ac.

The first and second compares are successful 1 out of k2 times and the third compare is unsuccessful k - 1 out of k times. Thus the probability of exactly three compares (two successful followed by one unsuccessful) is P3 = (k - 1)/k3. For our example, P3 = 2/27, which corresponds to the 6 out of 81 four letter patterns that start as abb or abc.

There are three successful compares 1 out of k3 times followed by a fourth unsuccessful compare k - 1 out of k times. There are four successful compares 1 out of k4 times. Thus the probability of exactly four compares (3 successful, 1 unsuccessful or 4 successful) is P4 = (k - 1)/k4 + 1/k4. For our example, P4 = 3/81, which corresponds to the 2 out of 81 four letter patterns that start as abaa or abab and the one abac where 4 successful compares occur.

Thus, for a fixed text position i, the average number of compares is

$\displaystyle \sum_{j=0}^{j=m}$[j . Pj] = $\displaystyle \sum_{j=0}^{j=m-1}$[j(k - 1)/kj] + m/km - 1.

For our example the sum is

$\displaystyle \sum_{j=0}^{j=m-1}$[2j/3j] + 4/33 = 40/27.

With some careful algebraic manipulation, one can derive the formula

 
$\displaystyle \sum_{j=0}^{j=m-1}$[j(k - 1)/kj] + m/km - 1 = $\displaystyle {\frac{k}{k-1}}$$\displaystyle \left(\vphantom{1-\frac{1}{k^{m}}}\right.$1 - $\displaystyle {\frac{1}{k^{m}}}$ $\displaystyle \left.\vphantom{1-\frac{1}{k^{m}}}\right)$, (1)

which never gets bigger than 2. That is, at any position in the text, the average number of compares is less than two. It follows that the average number of compares is at most 2n, and this bound is independent of the alphabet size and pattern length.

Problem 1:

We'd like to show that the summation

$\displaystyle \sum_{j=0}^{j=m-1}$[j(k - 1)/kj] + m/km - 1

does simplify to the closed from

$\displaystyle {\frac{k}{k-1}}$$\displaystyle \left(\vphantom{1-\frac{1}{k^{m}}}\right.$1 - $\displaystyle {\frac{1}{k^{m}}}$ $\displaystyle \left.\vphantom{1-\frac{1}{k^{m}}}\right)$.

There are at least three ways to establish this identity. One is to derive it from first principles: the form jk-jsuggests the use of derivatives from the know sum of geometric factors k-j. Induction over k and m provides a second proof. A third idea is to establish the identity empirically with a program that evaluates the sum and the formula for it. This last method, by itself, can never be used to provide a complete proof. Use one or more of the above methods to establish equation [*]



William Shoaff
1999-06-14