“Smart data structures and dumb code works a lot better than the other way around.
”
—Eric S. Raymond
Key-value pair abstraction
Ex: DNS Lookup
| domain name | IP address |
|---|---|
| cse.taylor.edu | 192.195.249.26 |
| gfx.cse.taylor.edu | 192.195.249.31 |
| taylor.edu | 192.195.250.21 |
| application | purpose of search | key | value |
|---|---|---|---|
| dictionary | find definition | word | definition |
| book index | find relevant pages | term | list of page numbers |
| file share | find song to download | name of song | computer ID |
| financial account | process transactions | account number | transaction details |
| web search | find relevant web pages | keyword | list of page names |
| compiler | find properties of variables | variable name | type and value |
| routing table | route Internet packets | destination | best route |
| DNS | find IP address | domain name | IP address |
| reverse DNS | find domain name | IP address | domain name |
| genomics | find markers | DNA string | known positions |
| file system | find file on disk | filename | location on disk |
Also known as: maps, dictionaries, associative arrays
Generalizes arrays: Keys need not be between \(0\) and \(N-1\)
Language support
PHP: every array is an associative array
JavaScript: every object is an associative array
Lua: table is the only primitive data structure
hasNiceSyntaxForAssociativeArrays['Python'] = True hasNiceSyntaxForAssociativeArrays['Java'] = False # legal Python code
Associative array abstraction: associate one value with each key

null (Java allows null value)get() returns null if key not presentput() overwrites old value with new valueIntended consequences
contains()public boolean contains(Key key) {
return get(key) != null;
}
delete()public void delete(Key key) {
put(key, null);
}
Value type: any generic type
Key type: several natural assumptions
Comparable, use compareTo() (specify Comparable in API)equals() to test equalityhashCode() to scramble keyBest practices: Use immutable types for symbol table keys
Integer, Double, String, ...StringBuilder, arrays, ...All java classes inherit a method equals()
Java requirements: for any references x, y, and z:
x.equals(x) is truex.equals(y) iff y.equals(x)x.equals(y) and y.equals(z), then x.equals(z)x.equals(null) is falseEquivalence relation: reflexive, symmetric, transitive
Default implementation: (x==y: do x and y refer to same object?)
Customized implementations: Integer, Double, String, ...
User-defined implementations: some care needed
Seems easy, but...
public class Date implements Comparable<Date> {
private final int month;
private final int day;
private final int year;
/* ... */
public boolean equals(Date that) {
// check that all significant fields are the same
if(this.day != that.day ) return false;
if(this.month != that.month) return false;
if(this.year != that.year ) return false;
return true;
}
}
Seems easy, but requires some care
// typically unsafe to use equals() with inheritance
// (would violate symmetry)
public final class Date implements Comparable<Date> {
private final int month, day, year;
// ...
// must be Object (why? experts still debate)
public boolean equals(Object y) {
if(y == this) return true; // optimize for true object equality
if(y == null) return false; // check for null
// objects must be in the same class (religion: getClass() vs instanceof)
if(y.getClass() != this.getClass()) return false;
Date that = (Date) y; // cast is guaranteed to succeed
// check that all significant fields are the same
if(this.day != that.day ) return false;
if(this.month != that.month) return false;
if(this.year != that.year ) return false;
return true;
}
}
"Standard" recipe for user-defined types
Optimization for reference equality (i.e., this == that)
Check against null
Check that two objects are of the same type; cast
Compare each significant field:
if field is a primitive type, use == (but use Double.compare() with double to deal with -0.0 and NaN)
if field is an object, use equals() (apply rule recursively)
if field is an array, apply to each entry (can use Arrays.deepEquals(a,b) but not a.equals(b))
Best practices
No need to use calculated fields that depend on other fields
Compare fields mostly likely to differ first
Make compareTo() consistent with equals() (x.equals(y) iff x.compareTo(y) == 0)
Build ST by associating value i with ith string from standard input
public static void main(String[] args) {
ST<String, Integer> st = new ST<String, Integer>();
for(int i = 0; !StdIn.isEmpty(); i++) {
String key = StdIn.readString();
st.put(key, i);
StdOut.print(s + " " + i + ", ");
}
StdOut.println();
for(String s : st.keys())
StdOut.print(s + " " + st.get(s) + ", ");
StdOut.println();
}
$ java STTestClient < searchexample.txt S 0, E 1, A 2, R 3, C 4, H 5, E 6, X 7, A 8, M 9, P 10, L 11, E 12, A 8, C 4, E 12, H 5, L 11, M 9, P 10, R 3, S 0, X 7,
Frequency counter: Read a sequence of strings from standard input and print out one that occurs with highest frequency
$ cat tinyTale.txt it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness it was the epoch of belief it was the epoch of incredulity it was the season of light it was the season of darkness it was the spring of hope it was the winter of despair $ # tiny example (60 words, 20 distinct) $ java FrequencyCounter 1 < tinyTale.txt it 10 $ # real example (135,635 words, 10,769 distinct) $ java FrequencyCounter 8 < tale.txt business 122 $ # real example (21,191,455 words, 534,580 distinct) $ java FrequencyCounter 10 < leipzip1M.txt government 24763
Argument: minimum length of words to consider
Sources: tale.txt, leipzig1m.txt
public class FrequencyCounter {
public static void main(String[] args) {
int minlen = Integer.parseInt(args[0]);
ST<String, Integer> st = new ST<>(); // create ST
while(!StdIn.isEmpty()) {
// read string and update frequency
String word = StdIn.readString();
if(word.length() < minlen) continue; // ignore short str
if(!st.contains(word)) st.put(word, 1);
else st.put(word, st.get(word) + 1);
}
// print a string with max freq
String max = "";
st.put(max, 0);
for(String word : st.keys())
if(st.get(word) > st.get(max)) max = word;
StdOut.println(max + " " + st.get(max));
}
}
Data structure: Maintain an (unordered) linked list of key-value pairs
Search: Scan through all keys until find a match
Insert: Scan through all keys until find a match; if no match add to front
k v first - -- ----- S 0 -> S, 0 E 1 -> E, 1 > S, 0 A 2 -> A, 2 > E, 1 > S,0 R 3 -> R, 3 > A, 2 > E,1 > S,0 C 4 -> C, 4 > R, 3 > A,2 > E,1 > S,0 H 5 -> H, 5 > C, 4 > R,3 > A,2 > E,1 > S,0 E 6 -> H, 5 > C, 4 > R,3 > A,2 > E,6 > ... X 7 -> X, 7 > H, 5 > C,4 > R,3 > A,2 > E,1 > S,0 A 8 -> X, 7 > H, 5 > C,4 > R,3 > A,8 > ... M 9 -> M, 9 > X, 7 > H,5 > C,4 > R,3 > A,2 > E,1 > S,0 P 10 -> P,10 > M, 9 > X,7 > H,5 > C,4 > R,3 > A,2 > E,1 > S,0 L 11 -> L,11 > P,10 > M,9 > X,7 > H,5 > C,4 > R,3 > A,2 > E,1 > S,0 E 12 -> L,11 > P,10 > M,9 > X,7 > H,5 > C,4 > R,3 > A,2 > E,12 > ...
| implementation | search\(^*\) | insert\(^*\) | search\(^\dagger\) | insert\(^\dagger\) | ops on keys |
|---|---|---|---|---|---|
| seq search (unordered list) | \(N\) | \(N\) | \(N\) | \(N\) | equals() |
\(^*\)guarantee, \(^\dagger\)average
Challenge: Efficient implementations of both search and insert
Data structure: Maintain an ordered array of key-value pairs
Helper function rank: How many keys < key?
0 1 2 3 4 5 6 7 8 9 keys[] = A C E H L M P R S X lo hi m 0 1 2 3 4 5 6 7 8 9 successful search for P 0 9 4 A C E H L>M P R S X 5 9 7 . . . . . M P<R S X 5 6 5 . . . . . M>P . . . 6 6 6 . . . . . . P . . . loop exist with keys[m] = P: return 6 lo hi m 0 1 2 3 4 5 6 7 8 9 unsuccessful search for Q 0 9 4 A C E H L>M P R S X 5 9 7 . . . . . M P<R S X 5 6 5 . . . . . M>P . . . 6 6 6 . . . . . . P>. . . 7 6 - . . . . . . .*. . . loop exits with lo > hi: return 7
What is rank(H) if the array has the following keys?
0 1 2 3 4 5 6 7 8 9 keys[] = B C E H L M P R S X
rank(H) = 0rank(H) = 3rank(H) = 9rank(H) = 10What is rank(A) if the array has the following keys?
0 1 2 3 4 5 6 7 8 9 keys[] = B C E H L M P R S X
rank(A) = 0rank(A) = 3rank(A) = 9rank(A) = 10What is rank(Y) if the array has the following keys?
0 1 2 3 4 5 6 7 8 9 keys[] = B C E H L M P R S X
rank(Y) = 0rank(Y) = 3rank(Y) = 9rank(Y) = 10public Value get(Key key) {
if(isEmpty()) return null;
int i = rank(key);
if(i < N && keys[i].compareTo(key) == 0) return vals[i];
else return null;
}
// find number of keys < key
private int rank(Key key) {
int lo = 0; int hi = N-1;
while(lo <= hi) {
int mid = lo + (hi - lo) / 2;
int cmp = key.compareTo(keys[mid]);
if (cmp < 0) hi = mid - 1;
else if(cmp > 0) lo = mid + 1;
else return mid;
}
return lo;
}
Repeat the ST client example from before.
k v keys[] N vals[]
- -- ------------------- --- -----------------------------
S 0 S 1 0
E 1 E S 2 1 0
A 2 A E S 3 2 1 0
R 3 . . R S 4 . . 3 0
C 4 . C E R S 5 . 4 1 3 0
H 5 . . . H R S 6 . . . 5 3 0
E 6 . . . . . . 6 . . 6 . . .
X 7 . . . . . . X 7 . . . . . . 7
A 8 . . . . . . . 7 8 . . . . . .
M 9 . . . . M R S X 8 . . . . 9 3 0 7
P 10 . . . . . P R S X 9 . . . . . 10 3 0 7
L 11 . . . . L M P R S X 10 . . . . 11 9 10 3 0 7
E 12 . . . . . . . . . . 11 . . 12 . . . . . . .
A C E H L M P R S X 8 4 12 5 11 9 10 3 0 7
Problem: To insert, need to shift all greater keys over
| implementation | search\(^*\) | insert\(^*\) | search\(^\dagger\) | insert\(^\dagger\) | ops on keys |
|---|---|---|---|---|---|
| seq search (unordered list) | \(N\) | \(N\) | \(N\) | \(N\) | equals() |
| binary search (ordered array) | \(\log N\) | \(N\) | \(\log N\) | \(N\) | compareTo() |
\(^*\)guarantee, \(^\dagger\)average
Challenge: Efficient implementations of both search and insert
Problem: Given an array with all 0s in the beginning and all 1s at the end, find the index in the array where the 1s start.
Input: 000000 ... 0000111111 ... 1111
Variant 1: You are given the length of the array
Variant 2: You are not given the length of the array
| keys | values | keys | values | |||
|---|---|---|---|---|---|---|
min |
09:00:00 | Chicago | 09:19:32 | Chicago | size,keys |
|
| 09:00:03 | Phoenix | 09:19:46 | Chicago | size,keys |
||
| 09:00:13 | Houston | 09:21:05 | Chicago | size,keys |
||
| 09:00:59 | Chicago | 09:22:43 | Seattle | size,keys |
||
| 09:01:10 | Houston | 09:22:54 | Seattle | size,keys |
||
floor |
09:03:13 | Chicago | 09:25:52 | Chicago | ||
ceiling |
09:10:11 | Seattle | 09:35:21 | Chicago | ||
select,rank,get |
09:10:25 | Seattle | 09:36:14 | Seattle | ||
| 09:14:25 | Phoenix | 09:37:44 | Phoenix | max |
min() // 09:00:00
max() // 09:37:44
select(7) // 09:10:25
rank(09:10:25) // 7
get(09:10:25) // Seattle
floor(09:05:00) // 09:03:13
ceiling(09:05:00) // 09:10:11
size(09:15:00, 09:25:00) // 5
keys(09:15:00, 09:25:00) // [ 09:19:32, 09:19:46, 09:21:05, 09:22:43,
// 09:22:54 ]

| sequential search | binary search | |
|---|---|---|
| search | \(N\) | \(\log N\) |
| insert | \(N\) | \(N\) |
| min / max | \(N\) | \(1\) |
| floor / ceiling | \(N\) | \(\log N\) |
| rank | \(N\) | \(\log N\) |
| select | \(N\) | \(1\) |
| ordered iteration | \(N \log N\) | \(N\) |