A collection is a data type that stores groups of items
| data type | key operations | data structure |
|---|---|---|
| stack | push, pop |
LL, resizing array |
| queue | enqueue, dequeue |
LL, resizing array |
| priority queue | insert, delete-max |
binary heap |
| symbol table | put, get, delete |
BST, hash table |
| set | add, contains, delete |
BST, hash table |
“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious
”
—Fred Brooks
Collections: Insert and delete items. Which item to delete?
| data type | item to remove |
|---|---|
| stack | most recently added |
| queue | least recently added |
| randomized queue | random item |
| priority queue | largest (or smallest) item |
Requirement: Generic items are Comparable

Note: Duplicate keys allowed; delMax() and max() picks any maximum key
Challenge: Find the largest \(M\) items in a stream of \(N\) items, where \(N\) is huge and \(M\) is large
Constraint: not enough memory to store \(N\) items
$ more transactions.txt Turing 6/17/1990 644.08 vonNeumann 3/26/2002 4121.85 Dijkstra 8/22/2007 2678.40 vonNeumann 1/11/1999 4409.74 Dijkstra 11/18/1995 837.42 Hoare 5/10/1993 3229.27 vonNeumann 2/12/1994 4732.35 Hoare 8/18/1992 4381.21 Turing 1/11/2002 66.10 Thompson 2/27/2000 4747.08 Turing 2/11/1991 2156.86 Hoare 8/12/2003 1025.70 vonNeumann 10/13/1993 2520.97 Dijkstra 9/10/2000 708.95 Turing 10/12/1993 3532.36 Hoare 2/10/2005 4050.20 $ java TopM 5 < transactions.txt # sort key = last col Thompson 2/27/2000 4747.08 vonNeumann 2/12/1994 4732.35 vonNeumann 1/11/1999 4409.74 Hoare 8/18/1992 4381.21 vonNeumann 3/26/2002 4121.85
// Transaction data type is Comparable (ordered by $)
// use a min-oriented priority queue
MinPQ<Transaction> pq = new MinPQ<Transaction>();
while(StdIn.hasNextLine()) {
String line = StdIn.readLine();
Transaction transaction = new Transaction(line);
pq.insert(transaction);
if(pq.size() > M)
pq.delMin(); // pq now contains largest M items
}
// unordered ordered reverse // op ret sz contents contents ordered insert(P) 1 P P P insert(Q) 2 P Q P Q Q P insert(E) 3 P Q E E P Q Q P E delMin() E 2 P Q P Q Q P insert(X) 3 P Q X P Q X X Q P insert(A) 4 P Q X A A P Q X X Q P A insert(M) 5 P Q X A M A M P Q X X Q P M A delMin() A 4 P Q X M M P Q X X Q P M insert(P) 5 P Q X M P M P P Q X X Q P P M insert(L) 6 P Q X M P L L M P P Q X X Q P P M L insert(E) 7 P Q X M P L E E L M P P Q X X Q P P M L E delMin() E 6 P Q X M P L L M P P Q X X Q P P M L
A sequence of operations on a priority queue that is implemented using unordered array (left), ordered array (middle), and reverse ordered array (right)
What are runtimes of inserting and finding and deleting max?
| implementation | insert |
delMax |
max |
|---|---|---|---|
| unordered array | \(1\) | \(N\) | \(N\) |
| ordered array | \(N\) | \(1\) | \(1\) |
| goal for today | \(\log N\) | \(\log N\) | \(\log N\) |
Order of growth of running time for priority queue with \(N\) items

Complete Binary Tree with \(N=16 \text{ nodes}\) (\(\text{height} = 4\))
Property: Height of complete binary tree with \(N\) nodes is \(\lfloor \lg N \rfloor\).
Pf: Height increases only when \(N\) is a power of \(2\).

Array representation
0 1 2 3 4 5 6 7 8 9 10 11
a[i] = [ . T S R P N O A E I H G]
T
S R
P N O A
E I H G

What is the index of the parent of the item at index \(k\) in a binary heap?

0 1 2 3 4 5 6 7 8 9 10 11 a[i] = [ . T S R P N O A E I H G]
Array representation
Max-Heap ordering
"Just enough" ordering to support efficient priority queue operations.
a[1], which is the root of the binary treek at locations 2*k and 2*k+1k is at k/2insert() and delMax() violate heap order, but easy to fix upInsert: Add node at end, them swim it up
Remove the maximum: Exchange root with node at end, then sink it down
Scenario: A key becomes larger than its parent's key
To eliminate the violation:
private void swim(int k) {
while(k > 1 && less(k/2, k)) {
exch(k, k/2);
k = k/2;
}
}
![]() |
![]() |
Insert: Add node at end, then swim it up
Cost: At most \(1+\lg N\) compares
public void insert(Key k) {
pq[++N] = k;
swim(N);
}
![]() |
![]() |
Scenario: A key becomes smaller than one (or both) of its children's
To eliminate the violation:
private void sink(int k) {
while(2*k <= N) {
int j = 2*k; // first child
if(j < N && less(j, j+1)) j++; // second is larger
if(!less(k, j)) break; // parent > child?
exch(k, j);
k = j;
}
}
![]() |
![]() |
Delete max: Exchange root with node at end, then sink it down
Cost: At most \(2 \lg N\) compares
public Key delMax() {
Key max = pq[1];
exch(1, N);
pq[N--] = null; // prevent loitering!
sink(1);
return max;
}
![]() |
![]() |
public class MaxPQ<Key extends Comparable<Key>> {
private Key[] pq;
private int N;
public MaxPQ(int capacity) {
pq = (Key[]) new Comparable[capacity+1];
}
public boolean isEmpty() { return N == 0; }
public void insert(Key key) { /* see prev code */ }
public Key delMax() { /* see prev code */ }
private void swim(int k) { /* see prev code */ }
private void sink(int k) { /* see prev code */ }
private boolean less(int i, int j) {
return pq[i].compareTo(pq[j]) < 0;
}
private void exch(int i, int j) {
Key t = pq[i];
pq[i] = pq[j];
pq[j] = t;
}
}
| implementation | insert |
delMax |
max |
|---|---|---|---|
| unordered array | \(1\) | \(N\) | \(N\) |
| ordered array | \(N\) | \(1\) | \(1\) |
| binary heap | \(\log N\) | \(\log N\) | \(1\) |
order-of-growth of running time for priority queue with \(N\) items
Challenge: Delete a random key from a binary heap in logarithmic time

Do "half-exchanges" in sink or swim
Multiway heaps
Fact: Height of complete \(d\)-way tree on \(N\) nodes is \(\sim \log_d N\)

How many compares (in the worst case) to insert in a \(d\)-way heap?
How many compares (in the worst case) to delete-max in a \(d\)-way heap?
| implementation | insert |
delMax |
max |
|---|---|---|---|
| unordered array | \(1\) | \(N\) | \(N\) |
| ordered array | \(N\) | \(1\) | \(1\) |
| binary heap | \(\lg N\) | \(\lg N\) | \(1\) |
| \(d\)-ary heap | \(\log_d N\) | \(d \log_d N\) | \(1\) |
| Fibonacci | \(1\) | \(\log N^*\) | \(1\) |
| Brodal queue | \(1\) | \(\log N\) | \(1\) |
| impossible | \(1\) | \(1\) | \(1\) |
\(^*\) amortized
sweet spot for \(d\) is \(d=4\)
why is last line impossible?
order-of-growth of running time for priority queue with \(N\) items
Underflow and overflow
Minimum-oriented priority queue
less() with greater()greater()Binary heap is not cache friendly (ex: page size = 8 nodes)
Other operations
sink() and swim() (stay tuned for Prim/Dijkstra)Immutability of keys
Data type: set of values and operations on those values
Immutable data type: cannot change the data type value once created
public final class Vector { // final = can't override
// instance methods
private final int N; // instance vars private
private final double[] data; // and final
public Vector(double[] data) {
this.N = data.length;
this.data = new double[N];
for(int i = 0; i < N; i++) // defensive copy of
this.data[i] = data[i]; // mutable instance vars
}
/* ... */ // instance methods don't
// change instance vars
}
Immutable: String, Integer, Double, Color, Vector, Transaction, Point2D
Mutable: StringBuilder, Stack, Counter, Java array
Advantages of immutability:
Disadvantage: Must create new object for each data type value
“Classes should be immutable unless there's a very good reason to make them mutable. [...] If a class cannot be made immutable, you should still limit its mutability as much as possible.
”
—Joshua Bloch (Java architect)

What are the properties of this sorting algorithm?
public void sort(String[] a) {
int N = a.length;
MaxPQ<String> pq = new MaxPQ<String>();
for(int i = 0; i < N; i++) pq.insert(a[i]);
for(int i = N-1; i >= 0; i--) a[i] = pq.delMax();
}
Basic plan for in-place sort



Heap construction: build max heap using bottom-up method (we assume array entries are indexed 1 to N)
Sortdown: Repeatedly delete the largest remaining item
Heap construction (first pass):
for(int k = N/2; k >= 1; k--)
sink(a, k, N);
Sortdown (second pass):
while(N > 1) {
exch(a, 1, N--);
sink(a, 1, N);
}
public class Heap {
public static void sort(Comparable[] a) {
int N = a.length;
for(int k = N/2; k >= 1; k--) sink(a, k, N);
while(N > 1) {
exch(a, 1, N);
sink(a, 1, --N);
}
}
private static void sink(Comparable[] a, int k, int N) {
/* as before, but make static and pass arguments */
}
private static boolean less(Comparable[] a, int i, int j) {
/* as before, but convert from 1-based indexing to 0-base */
}
private static void exch(Object[] a, int i, int j) {
/* as before, but convert from 1-based indexing to 0-base */
}
}
N k 0 1 2 3 4 5 6 7 8 9 10 11
S O R T E X A M P L E initial values
11 5 . . . . L . . . . E E
11 4 . . . T . . . M P . .
11 3 . . X . . R A . . . .
11 2 . T . P L . . M O . .
11 1 X T S . . R A . . . .
X T S P L R A M O E E heap-ordered
10 1 T P S O L . . M E . X
9 1 S P R . . E A . . T .
8 1 R P E . . E A . S . .
7 1 P O E M L . . R . . .
6 1 O M E A L . P . . . .
5 1 M L . A . O . . . . .
4 1 L E E A M . . . . . .
3 1 E A E L . . . . . . .
2 1 E A E . . . . . . . .
1 1 A E . . . . . . . . .
A E E L M O P R S T X sorted result
![]() |
![]() |
![]() |
![]() |
Key:
Proposition: Heap construction uses \(\leq 2N\) compares and \(\leq N\) exchanges
Pf sketch (assume \(N = 2^{h+1}-1\)):

\[\begin{array}{rcl} h + 2(h-1) + 4(h-2) + 8(h-3) + \ldots + 2^h(0) & \leq & 2^{h+1}-1 \\ & = & N \end{array}\]
note: left side of \(\leq\) is a tricky sum (see Discrete Math)
Proposition: Heapsort uses \(\leq 2N \lg N\) compares and exchanges, though algorithm can be improved to \(\sim 1 N \lg N\) (but no such variant is known to be practical, see Adaptive heap sort)
Significance: In-place sorting algorithm with \(N \log N\) worst-case
Proposition: Heapsort uses \(\leq 2N \lg N\) compares and exchanges, though algorithm can be improved to \(\sim 1 N \lg N\) (but no such variant is known to be practical, see Adaptive heap sort)
Bottom line: Heapsort is optimal for both time and space, but...
Goal: as fast as quicksort in practice; \(N \log N\) worst case, in place
Introsort
In the wild: C++ STL, Microsoft .NET Framework, Go
| inplace? | stable? | best | avg | worst | remarks | |
|---|---|---|---|---|---|---|
| selection | ✔ | \(\onehalf N^2\) | \(\onehalf N^2\) | \(\onehalf N^2\) | \(N\) exchanges | |
| insertion | ✔ | ✔ | \(N\) | \(\onequarter N^2\) | \(\onehalf N^2\) | use for small \(N\) or partially ordered |
| shell | ✔ | \(N \log_3 N\) | ? | \(c N^a\) | tight code; subquadratic | |
| merge | ✔ | \(\onehalf N \lg N\) | \(N \lg N\) | \(N \lg N\) | \(N \log N\) guarantee; stable | |
| timsort | ✔ | \(N\) | \(N \lg N\) | \(N \lg N\) | improves mergesort when preexisting order | |
| quick | ✔ | \(N \lg N\) | \(2 N \ln N\) | \(\onehalf N^2\) | \(N \log N\) probabilistic guarantee; fastest in practice | |
| 3-way qs | ✔ | \(N\) | \(2 N \ln N\) | \(\onehalf N^2\) | improves quicksort when duplicate keys | |
| heap | ✔ | \(3N\) | \(2 N \lg N\) | \(2 N \lg N\) | \(N \log N\) guarantee; in-place | |
| ? | ✔ | ✔ | \(N\) | \(N \lg N\) | \(N \lg N\) | holy grail of sorting |