# Sorting -- Getting it Straight

by William Shoaff with lots of help

## Contents

You can download a postscript version of this file (which is prettier) at

# Sorting

If you've studied the course notes on recursion, you've seen the mergesort and quicksort algorithms. Mergesort sorts by insertion, while quicksort sorts by exchange. These, along with selection other general sorting techniques that have many implementations.

Sorting by Insertion:
items are considered one at a time and inserted into the appropriate position relative to the previously considered items.
Sorting by Exchange:
when two items are found to be out of order, they are exchanged, and the process repeated until no more exchanges are needed.
Sorting by Selection:
the smallest (largest) item is found and placed first (last), then the next smallest (largest) is selected, and so on.
Sorting by Distribution:
Each item is compared with each of the others. Counting the number of smaller items determines the position of the item.

We want to describe some algorithms belonging to each time and determine their time and space complexities. Before we begin, let's list a few ideas and terms that should be known.

1.
We want to sort a file of records. A Record class could be defined as:

=
public class Record {
Key key;
Data satellite;
}


2.
The file'' of records will most often be implemented as an array, but linked list'' may be better for some sorts; and sometimes a file may be too large to fit in primary memory.
• An internal sort algorithm assumes the file will fit into main memory, when the file must reside on tape or disk, an external sort is used.
• Most of the sorts we consider are only appropriate for internal sorts.
• Let n denote the number of records in the file to be sorted.
3.
Each record contains a key kii = 0,..., n-1.
4.
The record may also contain satellite data, not related to the sorting problem, but perhaps to complexity if it must be moved frequently.
5.
There is an ordering relation <'' on the keys with the following properties holds for keys a, b, and c:
(a)
One of: a < b, a = b, or a > b (the trichotomy property).
(b)
If a < b and b < c, then a < c (the transitivity property).
Such an order relation is called a total or linear order.
6.
The goal is to find a permutation (0), (1),..., (n - 1) such that

k(0) k(1) ... k(n - 1).

7.
A sorting algorithm is stable if it preserves relative order of equal keys, that is

(i) < (j)    whenever    k(i) = k(j)    and    i < j

Consider a file of employees sorted by name. A stable re-sort based on salary will leave all making the same salary sorted alphabetically.
8.
The running time complexity is dominated by the number of comparisons and the number of record swaps.
9.
Comparison based sorts on sequential computers have a lower bound time complexity of (nlg n). Sorts that do rely on properties of the keys may sort more quickly.
10.
Many algorithms use sentinels at the ends (values such k0 = - or kn - 1 = ) of the arrays to simplify logic and avoid infinite loops.
11.
We'll demonstrate the sorts by using the 16 numbers chosen at random by Don Knuth on March 19, 1963.

503    087    512    061    908    170    897    275    653    426    154    509    612    677    765    703

# Sorting by Insertion

We will look at straight insertion sort, binary insertion, and Shell's sort as examples of insertion sorting. Don Knuth [3] text on searching and sorting provides an in-depth coverage of these algorithms.

## Straight insertion sort

Given an array k1, ..., kn - 1 of n - 1 keys we want to rearrange the items so that they are in ascending order Assume a sentinel key k0 = - that is smaller than all other elements in the array.

Assume that for some j 1

k0 ... kj

have been rearranged so they are in ascending order, and now we want to insert kj + 1 into its correct position. We compare kj + 1 with kj, kj - 1, ...in turn until we find the correct position for kj + 1. The insertionSort() algorithm below performs these steps, but first let's consider an example.

The colon below separates the already sorted keys from the key to be inserted. The indicates where this key should be inserted. Note the sample array is indexed 0 through 16 to hold the 16 values and the sentinel.

i = 2         -         503     :     087        2 compares

i = 3         -     087    503         :     512        1 compare

i = 4         -         087    503    512     :     061        4 compares

i = 5         -     061    087    503    512     :     908        b1 compare

i = 6         -     061    087        503    512    908     :     170        4 compares

i = 16     -     061    087    154    170    275    426    503    509    512    612    653    677        765    897    908  :  703

-     061    087    154    170    275    426    503    509    512    612    653    677    703    765    897    908

[[Straight Insertion Sort]]=
public void insertionSort (Record[] record) {
for (int i = 2; i < record.length; i++) {
Record r = record[i];
int key = record[i].key;
int j = i;
while (record[j-1].key > key) {
record[j] = record[j-1];
--j;
}
record[j] = r;
}
}


It should be clear that insertionSort() only uses a few extra registers and so has constant space complexity S(n) = O(1). The running time T(n) is also easy to determine, but this is our first while loop to analyze.

### Best and worst running times of straight insertion sort

The comparison in the while loop Boolean condition may execute only once (when key[i-1] = key[j-1] v = key[i]), or as many as i times (when key[i-1] key[i-2] ... key[1] > v = key[i] and key[0] < v).

In the best case the data is already sorted in ascending order, and insertion sort will execute n - 2 comparisons (one for each i = 2 to key.length - 1 = n - 1). It will also execute 2(n - 2) record assignments: r = record[i] and record[j] = r. Thus,

TinsertionSort()(n) = (n).

In the worst case the data is sorted in descending order, and the while loop executes i times for each i. Thus, the number of comparisons is

i = - 1 = O(n2),

and the number of record assignments is

i + 2(n - 2) = + 2n - 5 = O(n2).

Thus,

TinsertionSort()(n) = O(n2).

### Average running times of straight insertion sort

Now let's try something new and reason about the average case complexity of straight insertion sort. You may want to take a detour here and read some general remarks on how one goes about determining average case complexities.

For the average case time complexity, we need the probability (P(j)) that j compares are made in the while test where j can take any of the values from j = 1 to j = i. Recall that k0k1,..., ki - 1 are in sorted ascending order when we are inserting ki into the list.

• For j = 1 compare, it must be that ki ki - 1, therefore ki is greater than (or equal) all previous keys. Out of the i! permutations of the values k1,..., ki there are (i - 1)! where the largest is last. Thus,

P(1) = =

• For j = 2 compares, it must be that ki < ki - 1 but ki ki - 2, therefore ki is greater than (or equal) to all but one of the previous keys. Out of the i! permutations of the values k1,..., ki there are (i - 1)! where the second largest is last. Thus,

P(2) = =

• In general, for j compares, it must be that

ki - 1 > kiki - 2 > ki,..., ki - j + 1 > ki,

or equivalently,

(ki - 1ki - 2, ..., ki - j + 1) > ki

and

ki - j ki.

That is, ki is the jth largest element in the array

k1k2, ..., ki - 1ki.

Another way to say this is: the rank of ki is i - j + 1, where by rank we mean the number of keys less than or equal to it in the set

{k1k2, ..., ki - 1ki}.

• The probability of j compares is

P(j) = .

There is one out of i positions to place the jth largest element in the last position.

Thus, the average number of comparisons is

 (j) = = -

The average case complexity of straight insertion is

Taverage(n) = O(n2).

## Binary insertion

When inserting the ith record in straight insertion sort key kiis compared with about i of the previous sorted records. From the study of binary search techniques we know that only about lg i compares need to be made to determine where to insert an item into a sorted list. Binary insertion was mentioned by John Mauchly in 1946 in the first published discussion of computer sorting.

## Shell's sort

Another variant of insertion sorting was proposed by Donald Shell in 1959 that allows insertion of items that are far apart with few comparisons. The idea is to pick an increment, call it h, and rearrange the file of records so that every hth element is correctly sorted. Then decrease the increment h and repeat. This continues until h = 1. The diminishing sequence of increments

..., 1093, 364, 121, 40, 13, 4, 1

where hk = 3hk - 1 + 1, h0 = 1, works well in practice. From empirical studies it has been found that

T(n) = 1.66n1.25    or    T(n) = 0.33n(lnn)2 - 1.26n

model the running time of Shell's sort for this diminishing sequence of increment.

When the increment is

h = 2k - 1    and    k = lgn, 15, 7, 3, 1

Shell's sort has running time O(n3/2). A general analysis of Shell's sort is difficult.

The code below comes from Kernighan and Ritchie [2].

= void shellsort(int v[], int n) { int gap, i, j, temp; for (gap = n/2; gap > 0; gap /=2) for (i = gap; i < n; i ++) for (j=i-gap; j >= 0 && v[j] > v[j+gap]; j-=gap) { temp = v[j]; v[j] = v[j+gap]; v[j+gap] = temp; } }

# Sorting by Exchange

We will now study exchange sorts that transpose records when they are found to be out of order. We will study bubblesort and mention cocktail shaker sort. Neither of these is particularly efficient, but bubble sort does provide an interesting analysis. So you do not go away thinking that exchange sorts are inefficient, recall that quicksort is an exchange sort, radix-exchange sorting is very efficient, and there is an interesting parallel exchange sort, known as Batchler's method.

## Bubblesort

Bubblesort repeatedly passes through the file exchanging adjacent records if necessary; when no exchanges are needed the file is sorted. That is, key k0 is compared with k1 and they are exchanged if out of order. Then do the same with k1 and k2, then k2and k3, etc. Eventually, the largest item will bubble'' to the end of the file. This is repeated to bubble'' the next largest item up to the n - 2 position, and then the next largest, etc. The bubbleSort() algorithm below performs these steps, but first let's consider an example. The horizontal line indicates where comparisons stop. Items highlighted in bold font have bubbled up during a pass. Note the sample array is indexed 0 through 15 to hold the 16 values.

 pass 1 pass 2 pass 3 908 908 908 765 703 897 897 677 765 703 765 612 677 765 703 509 612 677 677 154 509 612 653 426 154 509 612 653 426 154 509 275 653 426 154 897 275 653 426 170 897 275 512 908 170 512 275 061 512 170 503 512 061 503 170 087 503 061 87 503 087 087 61 15 compares 14 compares

The brute force algorithm executes 15 passes in total, but we could after 9 passes since no exchanges occur after that.

[[Bubble sort]]=
public void  bubbleSort (Record[] record) {
for (int i = record.length; i > 1; i--) {
for (int j = 1; j < i; j++) {
if (record[j-1].key > record[j].key) {
record.swap(j-1, j);
}
}
}
}


### Analysis of bubble sort

Counting the number of record comparisons is easy using sums. Letting n=record.length, we know that

1 = i - 1

record compares occur in the inner for loop. Summing this over the outer for loop yields

i - 1 = .

Thus the running time of bubble sort is

TbubbleSort()(n) = = (n2).

We can also count the number of record exchanges in bubble sorting. In the worst case, a swap is made for every comparison; this occurs when the file is in reverse order

Tswaps(n) = = O(n2).

In the best case, no swaps are made; this occurs when the file is already sorted.

Tswaps(n) = (1).

The average case analysis of record exchanges is interesting. We need the probability that the if test evaluates to true. We assume that the records are initially in random order and that any of the n! possible permutations of these records can occur with equal probability.

On the first pass of the outer for loop when i=record.length, the if test will be true for

• j = 1 if and only if k0 > k1, which occurs with probability 1/2. As an example, either of the two arrangements: (1, 2) or (2, 1) can occur with the same probability; only the second case, when the smallest key is last, causes a swap.
• j = 2 if and only if (k0k1) > k2, which occurs with probability 2/3; As an example, any of the six initial arrangements:

(1, 2, 3),    (1, 3, 2),    (2, 1, 3),    (2, 3, 1),    (3, 1, 2),    (3, 2, 1)

can occur with the same probability (1/6). After the first stage (j = 1) these records will be arranged as:

(1, 2, 3),    (1, 3, 2),    (1, 2, 3),    (2, 3, 1),    (1, 3, 2),    (2, 3, 1).

A swap will occur only the smallest or second smallest key is last (when 3 is not last), that is 4 out of 6 times.
• j = 3 if and only if (k0k1k2) > k3, which occurs with probability 3/4; A swap will occur only the first, second, or third smallest key is the fourth record.
and so on.

At each stage we will be comparing the maximum of

k0k1, ..., kj - 1

with kj, and for the swap to occur this maximum must be greater that kj. We can place the largest key from

k0k1, ..., kj

in any of the first j positions. For any of the j choices out of j + 1 positions for the placement of the largest key the if test will be true. Thus, the probability that the if test evaluates to true, on the first pass, is

Pswap(i = nj) = ,    j = 1,..., n - 1.

Alternatively, a swap will occur for j if the first, second, ..., jth smallest key is in the j + 1st location. Each of these cases occur with probability 1/(j + 1) and are mutually exclusive, so we find again

Pswap(i = nj) = ,    j = 1,..., n - 1.

The first pass of the outer for loop alters the initial random distribution of the keys and we must account for it. Now, on the second pass, when i=record.length-1, the if test will be true for

• j = 1 if and only if k2 < (k0k1), which occurs with probability 1/3. As an example, any of the six initial arrangements:

(1, 2, 3),    (1, 3, 2),    (2, 1, 3),    (2, 3, 1),    (3, 1, 2),    (3, 2, 1)

can occur with the same probability (1/6). However, after the first pass they will be arranged as

(1, 2, 3),    (1, 2, 3),    (1, 2, 3),    (2, 1, 3),    (1, 2, 3),    (2, 1, 3).

In only 2 out of 6 of these (when 2 is first [1 is second]) will a swap occur. Notice this occurs when in the initial distribution the smallest element is third.
• j = 2 if and only if k3 < (k0k1k2) or k3 < (k0k1) or k3 < (k0k2) or k3 < (k1k2), which occurs with probability 2/4. That is, the smallest or second smallest key is in the fourth position. As an example, any of the twenty-four initial arrangements:

(1, 2, 3, 4),    (1, 2, 4, 3),    (1, 4, 2, 3),    (4, 1, 2, 3)

(1, 3, 2, 4),    (1, 3, 4, 2),    (1, 4, 3, 2),    (4, 1, 3, 2)

(2, 1, 3, 4),    (2, 1, 4, 3),    (2, 4, 1, 3),    (4, 2, 1, 3)

(2, 3, 1, 4),    (2, 3, 4, 1),    (2, 4, 3, 1),    (4, 2, 3, 1)

(3, 1, 2, 4),    (3, 1, 4, 2),    (3, 4, 1, 2),    (4, 3, 1, 2)

(3, 2, 1, 4),    (3, 2, 4, 1),    (3, 4, 2, 1),    (4, 3, 2, 1)

are possible. After the first pass on i = n they will be arranged as

(1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4)

(1, 2, 3, 4),    (1, 3, 2, 4),    (1, 3, 2, 4),    (1, 3, 2, 4)

(1, 2, 3, 4),    (1, 2, 3, 4),    (2, 1, 3, 4),    (2, 1, 3, 4)

(2, 1, 3, 4),    (2, 3, 1, 4),    (2, 3, 1, 4),    (2, 3, 1, 4)

(1, 2, 3, 4),    (1, 3, 2, 4),    (3, 1, 2, 4),    (3, 1, 2, 4)

(2, 1, 3, 4),    (2, 3, 1, 4),    (3, 2, 1, 4),    (3, 2, 1, 4)

Then after the first stage on j = 1

(1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4)

(1, 2, 3, 4),    (1, 3, 2, 4),    (1, 3, 2, 4),    (1, 3, 2, 4)

(1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4),    (1, 2, 3, 4)

(1, 2, 3, 4),    (2, 3, 1, 4),    (2, 3, 1, 4),    (2, 3, 1, 4)

(1, 2, 3, 4),    (1, 3, 2, 4),    (1, 3, 2, 4),    (1, 3, 2, 4)

(1, 2, 3, 4),    (2, 3, 1, 4),    (2, 3, 1, 4),    (2, 3, 1, 4)

Thus the compare record[1] > record[2] will be true 12 out of 24 times.
• j = 3 if and only if the first, second, or third smallest key is in the fifth position, which occurs with probability 3/5;
and so on.

For i = n - 1, a swap will occur for j if the first, second, ..., jth smallest key is in the j + 2st location. Each of these cases occur with probability 1/(j + 2) and are mutually exclusive, so we find

Pswap(i = n - 1, j) = ,    j = 1,..., n - 2.

In general, the probability that the if test evaluates to true on the (n - i + 1)th pass is

 Pswap(i = n - k, j) = ,    j = 1,..., n - k - 1 = ,    j = 1,..., i - 1

And the average number of swaps is

This sum can be expanded explicitly as

 + + + ... + + + ... + + + ... + +

or, rearranging terms:
 = + (1 + 2) + (1 + 2 + 3) + ... () = (1 + 2 + ... + (n - 1)) =

## Cocktail shaker sort

A refinement of bubble sort is to reverse direction on each pass of bubble sort. This leads to a slight improvement on bubble sort but not to an extent that it becomes better than straight insertion sort.

# Sorting by selection

As a last general sorting methodology we will study selection sorts that select the smallest (or largest) key, output them, and then repeat. In particular, we will study straight selection sort, tree sort, and heapsort.

## Straight selection sort

Find the smallest key and transfer it to output (or the first location in the file). Repeat this step with the next smallest key, and continue. Notice after i steps, the records from location 0 to i - 1 will be sorted.

In the example, the current smallest value is highlighted in bold font as the values are scanned from left to right.

503    087    512    061    908    170    897    275    653    426    154    509    612    677    765    703

061    087    512    503    908    170    897    275    653    426    154    509    612    677    765    703

061    087    512    503    908    170    897    275    653    426    154    509    612    677    765    703

061    087    154    503    908    170    897    275    653    426    512    509    612    677    765    703

and so on.

[[Straight selection sort]]=
public void selectionSort(Record[] record) {
for (int i = 0; i < record.length; i++) {
int min = i;
for (int j = i+1; j < record.length; j++) {
if (record[j].key < record[min].key)  min = j;
}
Swap (record[min], record[i]);
}
}


### Analysis of straight selection sort

In all cases, the number of comparisons made by straight selection sort is

 1 = (n - i - 1) = = (n2).

Straight selection sort always makes n - 1 = (n) swaps. Since there are always a linear number of swaps, selection sort may be the best method when the records are large and expensive to move.

## Tree selection

Tree selection compares keys two at a time raising the smaller up a level in a binary tree. The record with the smallest key eventually moves to the root where it is output. This is similar to a tournament where players rise to the top of the bracket as they continue to win. Keys are compared in pairs and the larger (smaller) of each pair promoted. These promoted keys are compared in pairs and again the larger (smaller) is promoted. This continues until the largest (smallest) key is found.

Here's the tournament tree for our sample data.

Once the largest item is removed at most lgn comparisons are needed to pick the next largest number and fix-up the tree. That is, we need only follow one path from the leaf where the root came from back to the root of the tree.

It follows we need space of order S(n) = O(n) for storing the output and pointers to the original leaf of the keys. And it follows that tree selection has running time

T(n) = (nlg n).

To see this note:

1.
In setting up the original tree, n/2 compares are made at the leaves; n/4 compares at the next level up (down?); then n/8 compares and so on until the final 1 compare needed to determine the root. Summing these up we find
 + + + ... + 1 = n + + + ... + = n - 1 = n - 1

are needed to create the tree (it should be clear that this is the number of games needed to select the best of n players).
2.
And then each time we select the next best (largest) key it will require O(lg n) comparisons to fix-up the tree. Since we do this n - 1 times, the running time is log-linear.''

The importance of tree sort is that it generalizes to an important algorithm: heapsort.

## Heapsort

Heapsort was invented by J. W. J. Williams in 1964 and Robin Floyd suggested several efficient implements in the same year. Heaps can be used for priority queues, a data structure where the largest, i.e., highest priority item is always first. Priority queues need not be completely sorted, but it should be easy (efficient) to support the following operations on a priority queue:

• Construct a priority queue from n items
• Find the highest priority item
• Remove the highest priority item
• Insert a new item
• Delete an item

The heap data structure for implementing priority queues is a left-complete binary tree with the heap property. That is, a heap is a binary tree which is completely filled at all levels except possibly the last, which is filled from left to right and the key in each node is larger than (or equal to) the keys of its children.

A complete binary tree of height h has 2h - 1 internal nodes and 2h external (leaf nodes), or a total of n = 2h + 1 - 1 nodes. The number of leaf nodes in a left-complete binary tree of height h lies between 2h - 1 and 2h, and the number of internal nodes lies between 2h - 1 and 2h - 1. The total number of nodes in a left-complete binary trees lies between

2h n 2h + 1 - 1.

When the left and right subtrees of a left-complete binary tree are both complete they each contain 2h - 1 nodes. When the left subtree is complete to height h - 1, but the right subtree is complete only to height h - 2, the left subtree contains 2h - 1 nodes and the right contains 2h - 1 - 1 nodes. That is, the total number of nodes is

n = 2h + 2h - 1 - 2 = 3 . 2h - 1 - 2

and the left subtree contains roughly

n = (3 . 2h - 1 - 2) = 2h - 4/3

of the nodes.

A heap can be stored in an array, indexed from 1, where node j has left child in position 2j and and right child in position 2j + 1. The parent of node j is in position j/2. For example

 position 1 2 3 4 5 6 7 8 9 value 15 12 14 11 6 7 8 9 3
The value 15 is stored in root node 1 and it has left child at node 2 (value 12) and right child at node 3 (value 14). The complete heap is shown below.

### Heapifying an array

Let's pretend we are given an array to heapify, say

 position 1 2 3 4 5 6 7 8 9 value 3 9 7 8 11 12 6 15 14
We can start in the middle of the array (at the bottom level of the tree), with the node at position n/2 = 9/2 = 4, and, if necessary, exchange the larger value from its two children with its value.

 position 1 2 3 4 5 6 7 8 9 value 3 9 7 15 11 12 6 8 14
Moving left one position in the array, we heapify the tree at this position. Note moving left in the array corresponds to moving right-to-left, then up one level in the tree.

 position 1 2 3 4 5 6 7 8 9 value 3 9 12 15 11 7 6 8 14

And continue the process, filtering down smaller values. Here's what happens at node 2.

 position 1 2 3 4 5 6 7 8 9 value 3 9 12 15 11 7 6 8 14

 position 1 2 3 4 5 6 7 8 9 value 3 15 12 9 11 7 6 8 14

 position 1 2 3 4 5 6 7 8 9 value 3 15 12 14 11 7 6 8 9
Finally, we heapify at the root node 1.

 position 1 2 3 4 5 6 7 8 9 value 15 3 12 14 11 7 6 8 9
 position 1 2 3 4 5 6 7 8 9 value 15 14 12 3 11 7 6 8 9
 position 1 2 3 4 5 6 7 8 9 value 15 14 12 9 11 7 6 8 3

Given an array A and an position k, where the binary tree rooted at 2k and 2k + 1 are heaps, we make the tree rooted at node k a heap with the heapify() algorithm. The idea is to exchange (if necessary) the element A[k] with the largest of A[2k] and A[2k + 1], and then if an exchange occurred, heapify() the changed left or right subtree. Clearly, the time to fix the relationship among A[k], A[2k], A[2k + 1] is (1).

Let's Pretend the tree at node k has n nodes. The children's subtree can have size at most 2n/3, which occurs when the last row of the tree is half full. Thus, the running time of heapify() is given by the recurrence

T(n) = T(2n/3) + (1)

which by the Master theorem gives

T(n) = (lg n)

(Note a = 1, b = 3/2 and k = 0 and so applying the condition that a = bkyields T(n) = (nklog3/2n) = (lg n).)

Two codes to heapify a file of records starting from index k follow. The first one is from Sedgewick [4]; the second from Corman et al [1]. (Note the arrays are indexed from 0 to n but only index 1 to n are used to store data.)

[[Heapify an array]]=
public void  heapify (Record[] record, int k) {
Record r = record[k];
int key = record[k].key;
while (k <= record.length/2) {
j = 2*k;
if (j < n && record[j].key < record[j+1].key) ++j;
if (key >=  record[j].key) break;
record[k] = record[j];
k = j;
}
record[k] = r;
}


[[Heapify an array]]=
public void  heapify (Record[] record, int k) {
int largest = k;
int left = 2*k;
int right = 2*k+1;
if (left <= record.length-1 && record[left].key > record[k].key) {
largest = left;
}
if (right <= record.length-1 && record[right].key > record[largest].key) {
largest = right;
}
if (largest != k) {
record.swap(k, largest);
heapify(record, largest);
}
}


Since the elements from position n/2 + 1], ..., n - 1have no children, they are each, trivially, one element heaps. We can build a heap by running heapify() on the remaining nodes. Each call will costs at most (lg n) operations and there will be (n) calls. Therefore, constructing a heap is at most (nlg n).

A more careful analysis shows we can build a heap in linear time ((n)). In particular, suppose the tree is complete, of height h, and has n = 2h + 1 - 1 nodes. Then we have:

 T(n) = lg(n) + 2lg(n/2) + 4lg(n/4) + ... + 2hlg(n/2h) = lg(n) + lg((n/2)2) + lg((n/4)4) + ... + lg((n/2h)2h) = lg(n(n/2)2(n/4)4 ... (n/2h)2h) = lg(n1 + 2 + 4 + ... + 2h/22 + 8 + ... + h2h) = lg(n2h + 1 - 1/22 - (h + 1)2h + 1 + h2h + 2) = (2h + 1 - 1)lgn - (2 - (h + 1)2h + 1 + h2h + 2) = nlg n - (2 - (h + 1)(n + 1) + 2h(n + 1)) = nlg n - (2 + (h - 1)(n + 1)) = nlg n - (2 + (h + 1 - 2)(n + 1)) = nlgn - (2 + (lgn - 2 - )(n + 1)) = nlgn - (2 + nlgn + lgn - (2 + )(n + 1)) = (2 + )(n + 1) - 2n - lg n = (n)

[Build a heap]=
buildHeap(Record[] record) {
for (int k = Math.floor (record.length/2); i > 0; i--) {
heapify(record, k);
}
}


### Finally, heapsort and it's analysis

The steps of heapSort() are:

1.
Build a heap;
2.
Exchange the root of the tree with the last element of the tree;
3.
Decrement the heap size by one;
4.
Heapify from the root of the tree;

Here's how the algorithm works on our example set of keys.

503    087    512    061    908    170    897    275    653    426    154    509    612    677    765    703

First we build a heap from the original array.

Next, exchange the root and last element and heapify from the root down, excluding the last element.

And repeat:

One more time:

Building a heap is O(n). Swapping the root and the last element and decrementing heap size are (1). Each time we heapify from the root of the tree it will take time

O(lgk),    k = n - 1, ..., 2

Thus, The time complexity of heapSort() is

T(n) = O(n) + [c + O(lgk)] = (nlg n)

[[Heapsort]]=
public void  heapSort(Record[] record) {
int n = record.length;
buildHeap(record);
for (int k = n-1; k > 1; k--) {
record.swap(1, k);
--record.length;
heapify(record, 1);
}
record.length = n;
}


# Lower bound on time complexity of comparison sorts

Comparison sorts determine the order of elements based only on comparisons between the input keys. Examples of comparison sorts are insertion sort, merge sort, selection sort and quicksort. Sequential comparison sorts can be viewed in terms of a decision tree.

Theorem 1   Any decision tree that sort n elements has height h = (nlg n). Thus, the number of comparisons C(n) in a sequential comparison sort is asymptotically bounded below by nlg n.

Consider a decision tree that sorts n items. Here's a decision tree for 3 items: k0k1k2. Boxes with colon separated integers i : j represent comparison of ki with kj. If ki < kj we traverse down a left branch, otherwise a right branch.

Since there are n! permutations of the items, a decision tree must have n! leafs (this assumes no redundant comparisons, but since we're interested in a lower bound on comparisons, that's okay). If the height of a decision tree is h then as many as C(n) = h comparisons are needed to sort some permutation of the keys. A binary tree of height h has no more than 2h leafs, so we have

n! 2h = 2C(n)

or, since h = C(n) is an integer

lgn! h = C(n).

By Stirling's asymptotic formula for n!

n!

or more exactly,

n! = 1 + + - + O

we find

C(n) lgn! = nlgn - n/(ln2) + lg n + O(1).

Some natural questions to ask are:

1.
Is there a sorting method that always produces the fewest number of compares?
2.
Is there a sorting method that minimizes the average number of compares?
I believe no one yet knows the answers to these questions.

When operations other than comparisons can be used to sort keys we may be able to sort using fewer than (nlg n) operations. There are Some sorting techniques use special properties of the input data to sort faster than (nlg n).

# Sorting by Distribution

We will now explore several linear time sorts. In particular, we will consider counting sort, radix sort, and bucket (or bin) sort.

## Counting sorts

Let's first looking at a sort based on counting that is not linear. It is called comparison counting, which from the above section implies that its running time is at least nlg n, and it will lead to a more efficient sort algorithm called distribution counting.

The basic idea in comparison counting is to use an auxiliary array C[] that holds the count of the number of keys less than a given key. For example, C[0] will tell how many keys are less than record[0].key, which implies that in the sorted file record[0] is in position C[0] + 1.

For the example sequence, we start with all counts initialized to 0. On the first pass, all keys bigger than the last have their counts incremented and the last has its incremented for all keys smaller than it. On the second pass, all keys (except the last) bigger than the next to last have their counts incremented and the next to last has its incremented for all keys smaller than it. This continues until we compare the key in position 1 with the key in position 0, incrementing the position 0 count by 1.

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 C (init.) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C, i = 15 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 12 C, i = 14 0 0 0 0 2 0 2 0 0 0 0 0 0 0 13 12 C, i = 13 0 0 0 0 3 0 3 0 0 0 0 0 0 11 13 12 C, i = 12 0 0 0 0 4 0 4 0 1 0 0 0 9 11 13 12 C, i = 10 0 0 1 0 5 0 5 0 2 0 0 7 9 11 13 12 C, i = 10 1 0 2 0 6 1 6 1 3 1 2 7 9 11 13 12 ... ... . . . . . . . . . . . . . . . . C, i = 2 5 1 8 0 15 3 14 4 10 5 2 7 9 11 13 12 C, i = 1 6 1 8 0 15 3 14 4 10 5 2 7 9 11 13 12

Note that comparison counting is a kind of address table sort; that is, the C array infers the position of each element in the sorted list, but no records are actually moved.

[Comparison counting]=
public void comparisonCount(Record[] record) {
int[] count = new int[record.length];
for (int i = 0; i < record.length; i++) count[i] = 0;
for (int i = record.length - 1; i > 0; i--) {
for (int j = i-1; j >= 0; j--) {
if (record[i].key < record[j].key) {
++count[j];
}
else {
++count[i];
}
}
}
}


It is clear that the time complexity of the comparison counting algorithm is

 T(n) = 1 + 1 = n + i = n + n(n - 1)/2 = (n2).

The space complexity of the sort is

S(n) = (n).

Distribution counting sort assumes that each of the n input elements is an integer in some range, say from u to v for some u < v. For simplicity, we'll assume u = 0 and v = m - 1. The steps of the algorithms are:

1.
Set the count of each number to zero.
2.
In one pass over the file, count the number of 1's, 2's, 3's, ..., m's.
3.
In one pass over the range, determine the number of elements less than or equal to k for each k = 1, 2,..., m.
4.
In a second pass over the file, move the records into position in auxiliary storage.

[Distribution counting]=
public Record[] countingSort(Record[] record, int m) {
int[] count = new int[m];
Record[] newRecord = new Record[record.length];
for (int i = 0; i < m; i++)  { // clear the counts to zero
count[i] = 0;
}
for (int j = 1; j < record.length; j++) { // increment count of each key
++count[record[j].key];
}
// count[i] now holds the number of i's in the file
for (int i = 1; i < m; i++) {
count[i] += count[i-1];
}
// count[i] now holds the number keys less than or equal to i
for (int j = record.length-1; j >= 0; j--) {
newRecord[count[record[j].key]] = record[j];
--count[record[j].key];
}
return newRecord;
}


In the example, the range is from u = 000 to v = 999. Only count array elements corresponding to keys are shown.

Initialize the count array C to zero.

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In one pass over the file count the number of times each key occurs.

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 C 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

In one pass over the range count the number of keys less than or equal others

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 C, j = 87 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C j = 154 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 C j = 170 1 2 1 1 1 4 1 1 1 1 3 1 1 1 1 1 ... ... . . . . . . . . . . . . . . . . C j = 908 7 2 9 1 16 4 15 5 11 6 3 8 10 12 14 13

Now move the last record to the 13th position in a new output file, and decrement the count of key 703 by 1. Then move the next to last record to output position 14 and decrement its count. And continue:

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 Output 703 C 7 2 9 1 16 4 15 5 11 6 3 8 10 12 14 12 Output 703 765 C 7 2 9 1 16 4 15 5 11 6 3 8 10 12 13 12 Output 677 703 765 C 7 2 9 1 16 4 15 5 11 6 3 8 10 11 13 12 Output 612 677 703 765 C 7 2 9 1 16 4 15 5 11 6 3 8 9 11 13 12 Output 509 612 677 703 765 C 7 2 9 1 16 4 15 5 11 6 3 7 9 11 13 12
And so on.

Distribution counting sort is stable: numbers with the same value appear in the output array in the same order as they were in the input array. When v - u = m - 1 = O(n), the distribution counting sort runs in

T(n) = (n + m)

(linear) time. The space complexity of distribution counting sort is also

S(n) = (n + m).

Radix sort was used by card-sorting machines, which if you may never have seen unless you are an old timer. Herman Hollerith was 20 when he build his original tabulating and sorting machine for the 1890 U. S. census. His machine used the basic idea for radix sorting. Radix sorting is exactly opposite to merging.

We assume keys are represented by d-tuples

k = (ad - 1ad - 2,..., a0)

that can be lexicographically ordered from left-to-right, that is,

(ad - 1ad - 2,..., a0) < (bd - 1bd - 2,..., b0)

if

ad - 1 = bd - 1ad - 2 = bd - 2,..., aj + 1 = bj + 1,    but    aj < bj,    for somej = d - 1,..., 0.

This is how one orders (English) words; it also produces a valid sort on integers and cards.

Suppose we want to sort a 52-card deck of playing cards. We define an order on face values:

A < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < J < Q < K

and an order on suits:

< < < .

The lexicographic order of the cards is:

(, A) < (, 2) < ... (, K) < (, A) < ... < (, Q) < (, K).

The radix sort idea is deal the cards face up into 13 piles, one for each face value. Then collect the cards placing the aces on the bottom, then the 2's on top of them, and so on, placing the four kings on top. Now deal the cards into four piles, one for each suit, and collect the cards again, clubs first, then diamonds, then hearts, and finally spades. The cards will now be in order.

This card sorting technique is a least significant digit radix sort. It also works for sorting integers and words. Here's how our sample data is sorted.

First we count the number of 0's, 1's, 2's,..., 9's in the units (least significant) digit of the data and accumulate the space needed to restore the as in distribution counting. Then we count on the tens digit and restore the data. Finally, we count on the hundreds digit and restore the data.

 Keys 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 Units count 1 1 2 3 1 2 1 3 1 1 Storage needed 1 2 4 7 8 10 11 14 15 16 Restored keys 170 061 512 612 503 653 703 154 275 765 426 087 897 677 908 509 Tens count 4 2 1 0 0 2 3 3 1 1 Storage needed 4 6 7 7 7 9 11 14 15 16 Restored keys 503 703 908 509 512 612 426 653 154 061 765 170 275 677 087 897 hundreds count 2 2 1 0 1 3 3 2 1 1 Storage needed 2 4 5 5 6 9 12 14 15 16 Restored keys 061 087 154 170 275 426 503 509 512 612 653 677 703 765 897 908

[[Radix sort]]=
public void radixSort(Record[] record, int d) {
for (int i = 0; i < d; i++) {

}
}


If we assume distribution counting sort is used as the stable sorting algorithm on each digit, then the running time of radix sort is

T(n) = (d (n + m))

where d is the number of digits, m is the range of digits, and n is the number of records sorted. The values d and m are normally small and constant (as a function of n), so radix sort is a linear time sort.

## Bucket (Bin) Sort

Bucket sort runs in linear time on average. To achieve this average case behavior, we assume the keys are uniformly distributed over some range The idea is to divide the range into (about) n equal sized subranges (or buckets or bins). Then, in one pass through the file, place each record in the bucket to which its key belongs.

For our sample data, let's create 10 buckets corresponding to

[0, 99), [100, 199),... [800, 899), [900, 999).

We can call them B[0], B[1],..., B[9], and calculate an index i from a key value by division and truncation:

i = key/100.

[[Bucket sort]]=
public void bucketSort(Record[] record) {
Record r = new Record[record.length];
for (int i = 0; i < record.length; i++) {

}
for (int i = 0; i < record.length; i++) {
insertionSort(r);
}

}


### Analysis of bucket sort

Except for the call to insertionSort(), the complexity of bucketSort() is O(n) in the worst case. Under the assumption that the data is uniformly distributed, the probability that a given record falls in bucket i is p = 1/n. Let ni be a random variable denoting the number of elements in bucket i. The probability that ni = j is given by a binomial distribution. That is, the for ni to equal j, j of n records must have fallen in bucket i and n - j must have fallen in other buckets. The probability of this occurring is given by The expected value of a random variable ni fitting a binomial distribution is given by

 E[ni] = jn jpj(1 - p)n - j

It can also be shown that the variance of a random variable fitting a binomial distribution is

V[ni] = E[ni2] - E2[ni] = np(1 - p).

The time complexity of insertion sort on ni keys is O(ni2), and so using summation notation to determine the running time of the for loops that executes the insertion sorts, we find

 O(E[ni2]) = OE[ni2] = OV[ni] + E2[ni] = Onp(1 - p) + (np)2 = O1 - + 1 = 2n - 1

Thus, the expected (average) time for bucket sort is

T(n) = O(n).

# Summing up sorting

There's still a tremendous amount of knowledge about sorting to be covered. We've simply brushed the surface. There is no best sorting method. You should be able to use the information gleaned here to begin to have positive ideas about which algorithm to choose for a given application.

Below is a table of summing up basic facts about the internal sorting algorithms we have studied. The space and running times are given as orders of growth.

 Running times Sort method Stable Space Average Worst Bubblesort Yes 1 n2 n2 Bucket sort Yes n n n Comparison counting Yes n n2 n2 Distribution counting Yes n + m n + m n + m Heapsort No 1 nlg n nlg n Mergesort Yes n nlg n nlg n Quicksort No lg n nlg n n2 Shell's sort No 1 n1.25 n1.5 Straight insertion Yes 1 n2 n2 Straight selection Yes 1 n2 n2 Radix sort Yes 1 d (n + m) d (n + m)

The range of objects in distribution counting and radix sort has m items; there are d digits in the lexicographic order for radix sort.

Bubblesort
is notoriously slow and most likely should never be used.
Bucket sort
is fast but can only be used in special cases when the key can be used to calculate the address of buckets.
Comparison counting
is useful because it generalizes to distribution counting.
Distribution counting
is useful when keys have a small range. It is stable. It requires extra memory for counts of each element in the range and auxiliary storage of the record file.
Heapsort
requires constant space and guaranteed to have good running time. It running times is roughly twice that of quicksort's average running time.
Mergesort
always has good running time, but requires O(n) extra space.
Quicksort
is the most useful general purpose internal sorting algorithm. It requires lg n space for recursion. It is not stable. On average it beats all other internal sorting algorithms in running time, provided an intelligent choice of partitioning key is made.
Shell's sort
Is easy to program, not stable, uses constant space, and reasonably efficient even for large n.
Straight insertion
is a simple method to program. It is stable. It requires constant extra space. It is quite efficient for small n and when the records are nearly sorted, but very slow when n is large and the data not nearly sorted.
Straight selection
is simple to program, stable, requires constant space, and works well for small n, but not when n is large.
is appropriate for keys that are short and lexicographically ordered. It should not be used for small n.

The table above does not provide timing differences between algorithms which have the same order of growth. Based on estimates given in Knuth's Sorting and Searching [3], we can give the following advice on the average running time of the algorithms. But first, let's be clear about terms. To say algorithm A is m% faster than algorithm B we mean

= 1 + .

Linear time algorithms:
Distribution counting is about 50% faster than radix sort and bucket sort.
Log-Linear algorithms:
Quicksort is about 50% faster than heapsort and 25% faster than mergesort.
Insertion sort is about 25% faster than selection sort. Selection sort is about 60% faster than comparison counting. Comparison counting is about 60% faster than bubblesort.

# Problems

#### Problem 1:

Consider the data set 2, 5, 7. For all possible orders of this set determine the number of compares straight insertion sort would make. Verify that the minimum number of compares is n - 2 = 2, the maximum number of compares is - 1 = 5, and the average number of compares is - = 7/2.

#### Problem 2:

Design an algorithm for binary insertion and analyze its complexity.

#### Problem 3:

For one or more diminishing sequences of increments, test Shell's sort empirically and find curves that fit its running time well.

#### Problem 4:

An improved bubble sort keeps track of whether or not a swap is made on each pass of the file. When no swaps are made the file is sorted and the algorithm can be terminated. Design an algorithm that implements this improvement. What is the running time of this improved bubble sort algorithm?

#### Problem 5:

Consider the data set 2, 5, 7. For all possible orders of this set determine the number of swaps bubble sort would make. Verify that the minimum number of swaps is 0, the maximum number of swaps is = 3, and the average number of compares is = 3/2.

#### Problem 6:

Design an algorithm that implements the cocktail shaker idea.

#### Problem 7:

Solve the recurrence T(n) = T(2n/3) + 1, T(1) = 1 exactly.

#### Problem 8:

Here's a problem from [1]. Professor's Howard, Fine, and Howard have proposed the following elegant'' sorting algorithm:
StoogeSort(char[] A, int i, int j) {
if (A[i] > A[j])
swap (A[i], A[j]);
if (i + 1 >= j)
return;
k = floor((j - i + 1)/3);
StoogeSort(A, i, j - k); /* First two-thirds */
StoogeSort(A, i + k, j); /* Last two-thirds  */
StoogeSort(A, i, j - k); /* First two-thirds  again */
}

• Give a recurrence relation for the worst-case running time of StoogeSort.
• Solve the recurrence relation to find the bound on the worst-case running time.
• Do the professors deserve tenure?

#### Problem 9:

Illustrate the operation of distribution counting sort on the data

5, 6, 7, 3, 2, 6, 7, 5, 1, 4, 6

#### Problem 10:

What is the reason for decrementing the count in distribution counting sort whenever a record is moved to output?

#### Problem 11:

Illustrate the operation of radix sort on the data
VIA, ZIP, YAK, ZOO, YEN, VAT, WOO, ZAG, WIG, VEX, WAG YAW, WED, VOW, ZIG,

#### Problem 12:

Provide arguments that the claims about stability of the sorts mentions in section #_a#>re correct.

#### Problem 13:

Verify the percentage faster estimates'' given above by experiments. I'd like to know how accurate they are.

## Bibliography

1
T. H. CORMAN, C. E. LEISERSON, AND R. L. RIVEST, Introduction to Algorithms, McGraw-Hill, 1990.

2
B. W. KERNIGHAN AND D. M. RITCHIE, The C Programming Language, Prentice Hall, second ed., 1988.

3
D. E. KNUTH, The Art of Computer Programming: Sorting and Searching, vol. 3, Addison-Wesley, third ed., 1998.

4
R. SEDGEWICK, Algorithms in C++, Addison-Wesley, 1992.

William Shoaff
2000-10-16