Greedy: build up a solution incrementally, myopically optimizing some local criterion.
Divide-and-Conquer: Break up a problem into independent subproblems, solve each subproblem, and combine solution to subproblems to form solution to original problem.
Dynamic Programming: Break up problem into a series of overlapping subproblems, and build up solutions to larger and larger subproblems.
Note: "Dynamic Programming" is a fancy name for caching away intermediate results in a table for later reuse.
Bellman: Pioneered the systematic study of dynamic programming in 1950s
Etymology:
Areas:
Some famous dynamic programming algorithms:
Weighted interval scheduling problem
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Earliest finish-time first:
Recall: Greedy algorithm is correct if all weights are 1
Observation: Greedy algorithm fails spectacularly for weighted version
: : : : : : : : : : : : : : : : : : : :
Notation: Label jobs by finishing time: \(f_1 \leq f_2 \leq \ldots \leq f_n\)
Def: \(p(j)\) is largest index \(i<j\) such that job \(i\) is compatible with \(j\)
Ex: \(p(8) = 5, p(7) = 3, p(2) = 0\)
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Notation: \(\mathrm{OPT}(j)\) is value of optimal solution to the problem consisting of job requests \(1, 2, \ldots, j\)
Case 1\(^*\): \(\mathrm{OPT}\) selects job \(j\)
Case 2\(^*\): \(\mathrm{OPT}\) does not select job \(j\)
\[ \mathrm{OPT}(j) = \begin{cases} 0 & \text{if } j = 0 \\ \max \{ v_j + \mathrm{OPT}(p(j)), \mathrm{OPT}(j-1) \} & \text{otherwise} \end{cases} \]
\(^*\)optimal substructure property (proof via exchange argument)
\[ \mathrm{OPT}(j) = \begin{cases} 0 & \text{if } j = 0 \\ \max \{ v_j + \mathrm{OPT}(p(j)), \mathrm{OPT}(j-1) \} & \text{otherwise} \end{cases} \]
Brute-Force(n, s1, ..., sn, f1, ..., fn, v1, ..., vn): Sort jobs by finish time so that f[1] <= f[2] <= ... <= f[n] Compute p[1], p[2], ..., p[n] Return Compute-Opt(n) Compute-Opt(j): If j == 0 Return 0 Else Return max{ v[j] + Compute-Opt(p[j]), Compute-Opt(j-1) }
Observation: Recursive algorithm fails spectacularly because of redundant subproblems ⇒ exponential algorithm
Ex: Number of recursive calls for family of "layered" instances grows like Fibonacci sequence
Memoization: Cache results of each subproblem; lookup as needed
Top-Down(n, s1, ..., sn, f1, ..., fn, v1, ..., vn): Sort jobs by finish time so that f[1] <= f[2] <= ... <= f[n] Compute p[1], p[2], ..., p[n] For j = 1 to n M[j] <- empty M[0] <- 0 Return M-Compute-Opt(n) M-Compute-Opt(j): If M[j] is empty M[j] <- max{ v[j] + M-Compute-Opt(p[j]), M-Compute-Opt(j-1) } Return M[j]
Claim: Memoized version of algorithm takes \(O(n \log n)\) time
M-Compute-Opt(j)
M[j]
, orM[j]
and makes two recursive callsM[]
M-Compute-Opt(n)
is \(O(n)\) ∎Remark: \(O(n)\) if jobs are presorted by finish times
Q: Dynamic Programming (DP) algorithm computes optimal value. How to find a solution itself?
A: Make a second pass
Find-Solution(j) If j = 0 Return {} Else If v[j] + M[p[j]] > M[j-1] Return Union({ j }, Find-Solution(p[j])) Else Return Find-Solution(j-1)
Analysis: number of recursive calls \(\leq n\) ⇒ \(O(n)\)
Bottom-up dynamic programming: Unwind recursion
Bottom-Up(n, s1, ..., sn, f1, ..., fn, v1, ..., vn): Sort jobs by finish time so that f[1]<=f[2]<=...<=f[n] Compute p[1], p[2], ..., p[n] M[0] <- 0 For j = 1 to n M[j] <- max { v[j] + M[p[j]], M[j-1] }
Use the dynamic programming solution to solve the weighted interval schedule.
Bottom-Up(n, s1, ..., sn, f1, ..., fn, v1, ..., vn): Sort jobs by finish time so that f[1]<=f[2]<=...<=f[n] Compute p[1], p[2], ..., p[n] M[0] <- 0 For j = 1 to n M[j] <- max { v[j] + M[p[j]], M[j-1] }
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
Least squares: Foundational problem in statistics
\[ \mathrm{SSE} = \sum_{i=1}^n (y_i - a x_i - b)^2 \] |
![]() |
Solution: Calculus ⇒ min error is achieved when
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Segmented least squares:
Q: What is a reasonable choice for \(f(x)\) to balance accuracy (goodness of fit) and parsimony (number of lines)?
Given \(n\) points in the plane: \((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)\) with \(x_1 < x_2 < \ldots < x_n\) and a constant \(c > 0\), find a sequence of lines that minimizes \(f(x) = E + c L\):
Notation
To compute \(\mathrm{OPT}(j)\):
Segmented-Least-Squares(n, p1, ..., pn, c): For j = 1 to n For i = 1 to j Compute the least squares e(i,j) for the segment pi--pj M[0] <- 0 For j = 1 to n Find i in [1,j] that minimizes e(i,j) + c + M[i-1] M[j] <- e(i,j) + c + M[i-1] Return M[n]
Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Pf:
Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Remark: Can be improved to \(O(n^2)\) time and \(O(n)\) space
Knapsack problem
Example: Suppose \(W = 11\) and the weights and values are given in table at right. Ex: \(\{1,2,5\}\) has value 35 Ex: \(\{3,4\}\) has value 40. Ex: \(\{3,5\}\) has value 46, but exceeds weight limit. |
|
Greedy by value: Repeatedly add item with maximum \(v_i\)
Greedy by weight: Repeatedly add item with minimum \(w_i\)
Greedy by ratio: Repeatedly add item with maximum ratio \(v_i / w_i\)
Observation: None of greedy algorithms is optimal
Def: \(\mathrm{OPT}(i)\) is max profit subset of items \(1, \ldots, i\)
Case 1: \(\mathrm{OPT}(i)\) does not select item \(i\)
Case 2: \(\mathrm{OPT}(i)\) selects item \(i\)
Conclusion: Need more subproblems!
Def: \(\mathrm{OPT}(i, w)\) is max profit subset of items \(1, \ldots, i\) with weight limit \(w\)
Case 1: \(\mathrm{OPT}(i,w)\) does not select item \(i\) (possibly \(w_i > w\)?)
Case 2: \(\mathrm{OPT}(i,w)\) selects item \(i\)
\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \{ \mathrm{OPT}(i-1,w), v_i + \mathrm{OPT}(i-1,w-w_i)\} & \text{otherwise} \end{cases} \]
Knapsack(n, W, w1, ..., wn, v1, ..., vn): For w = 0 to W M[0, w] <- 0 For i = 1 to n For w = 1 to W If wi > w M[i, w] <- M[i-1, w] Else M[i, w] <- Max { M[i-1, w], vi+M[i-1, w-wi] } Return M[n, W]
\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \{ \mathrm{OPT}(i-1,w), v_i + \mathrm{OPT}(i-1,w-w_i)\} & \text{otherwise} \end{cases} \]
|
\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \left\{ \begin{array}{l} \mathrm{OPT}(i-1,w), \\ v_i + \mathrm{OPT}(i-1,w-w_i) \end{array} \right\} & \text{otherwise} \end{cases} \]
\(\mathrm{OPT}(i,w) =\) max profit subset of |
\(i\) | subset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \(\{ \}\) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | \(\{ 1 \}\) | 0 | |||||||||||
2 | \(\{ 1,2 \}\) | 0 | |||||||||||
3 | \(\{ 1,2,3 \}\) | 0 | |||||||||||
4 | \(\{ 1,2,3,4 \}\) | 0 | |||||||||||
5 | \(\{ 1,2,3,4,5 \}\) | 0 |
Running time: There exists an algorithm to solve the knapsack problem with \(n\) items and maximum weight \(W\) in \(\Theta(nW)\) time (weights are integers between \(1\) and \(W\))
RNA: String \(B = b_1 b_2 \ldots b_n\) over alphabet \(\{ A, C, G, U \}\)
Secondary structure: RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding the behavior of the molecule.
Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:
![]() |
\(S\) is not a secondary structure, because \(C{-}A\) is not a valid Watson-Crick Pair |
Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:
![]() |
\(S\) is not a secondary structure, because \(b_3\) and \(b_7\) are within 4 bases away |
Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:
![]() |
\(S\) is not a secondary structure, because \(G{-}C\) and \(U{-}A\) cross |
Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:
![]() |
\(S\) is a secondary structure |
Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:
Free-energy hypothesis: RNA molecule will form the secondary structure with the minimum total free energy (approximate by number of base pairs; more base pairs → lower free energy)
Goal: Given an RNA molecule \(B=b_1b_2 \ldots b_n\), find a secondary structure \(S\) that maximizes the number of base pairs.
Is the following a secondary structure?
Which subproblems?
First attempt: \(OPT(j)\) = maximum number of base pairs in a secondary structure of the substring \(b_1 b_2 \ldots b_j\)
Goal: \(OPT(n)\)
Choice: Match bases \(b_t\) and \(b_j\)
Difficulty: Results in two subproblems (but one of wrong form)
Def: \(OPT(i,j)\) = maximum number of base pairs in a secondary structure of the substring \(b_i b_{i+1} \ldots b_j\)
In which order to compute \(OPT(i,j)\)?
Q: In which order to solve the subproblems?
A: Do shortest intervals first—increasing order of \(|j-i|\)
RNA-Secondary-Structure(n, b1, ..., bn): For k = 5 to n - 1 For i = 1 to n - k j <- i + k Compute M[i, j] using formula // 3 cases // = 0 if i >= j-4 // = M[i,j-1] if bj not paired // = 1 + max_t( M[i,t-1] + M[t+1,j-1] ) // (all needed vals are already computed) Return M[1, n] |
|
Theorem: The DP algorithm solves the RNA secondary structure problem in \(O(n^3)\) time and \(O(n^2)\) space.
Outline
Techniques
Top-down vs Bottom-up dynamic programming: Opinions differ
Goal: Given an array of \(n\) integers (positive or negative), find a contiguous subarray whose sum is maximum. \[\begin{array} 12 & 5 & -1 & 31 & -61 & 59 & 26 & -53 & 58 & 97 & -93 & -23 & 84 & -15 & 6 \end{array}\]
Applications: Computer vision, data mining, genomic sequence analysis, technical job interviews, ...
Goal: Given an \(n\)-by-\(n\) matrix, find a rectangle whose sum is maximum.
\[ A = \left[\begin{matrix} -2 & 5 & 0 & -5 & -2 & 2 & -3 \\ 4 & -3 & -1 & 3 & 2 & 1 & -1 \\ -5 & 6 & 3 & -5 & -1 & -4 & -2 \\ -1 & -1 & 3 & -1 & 4 & 1 & 1 \\ 3 & -3 & 2 & 0 & 3 & -3 & -2 \\ -2 & 1 & -2 & 1 & 1 & 3 & -1 \\ 2 & -4 & 0 & 1 & 0 & -3 & -1 \end{matrix}\right] \]
Applications: Computer vision, data mining, genomic sequence analysis, technical job interviews, ...
Problem: Given \(n\) coin denominations \(\{c_1, c_2, \ldots, c_n\}\) and a target value \(v\), find the fewest coins needed to make change for \(v\) (or report impossible).
Recall: Greedy cashier's algorithm is optimal for U.S. coin denominations, but not for arbitrary coin denominations.
Ex: \(\{1,10,21,34,70,100,350,1295,1500\}\)
Optimal: \(140 = 70 + 70\)
Q: How similar are two strings?
Ex: ocurrance
and occurrence
Edit distance [Levenshtein 1966, Needleman-Wunsch 1970]
\[ \text{cost} = \delta + \alpha_{\text{CG}} + \alpha_{\text{TA}} \]
(assuming: \(\alpha_\text{AA} = \alpha_\text{CC} = \alpha_\text{GG} = \alpha_\text{TT} = 0\))
Applications: Bioinformatics, spell correction, machine translation, speech recognition, information extraction, ...
Spokesperson confirms senior government adviser was found Spokesperson said the senior adviser was found
“The BLOSUM (BLOcks SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences.
”
What is the edit distance between these two strings?
P A L E T T E |
P A L A T E |
Assume gap penalty \(\delta = 2\) and mismatch penalty \(\alpha = 1\)
Goal: given two strings \(x_1 x_2 \ldots x_m\) and \(y_1 y_2 \ldots y_n\), find a min-cost alignment
Def: An alignment \(M\) is a set of ordered pairs \((x_i,y_j)\) such that each character appears in at most one pair and no crossings. The pairs \((x_i,y_j)\) and \((x_{i'}, y_{j'})\) cross if \(i < i'\) but \(j > j'\).
Def: The cost of an alignment \(M\) is:
\[ \mathrm{cost}(M) = \underbrace{\sum_{(x_i,y_j) \in M} \alpha_{x_iy_j}}_{\text{mismatch}} + \underbrace{ \sum_{i:x_i \text{unmatched}} \delta + \sum_{j:y_j \text{unmatched}} \delta}_{\text{gap}} \]
Alignment of CTACCG
and TACATG
\[ M = \{ (x_2,y_1), (x_3,y_2), (x_4,y_3), (x_5,y_4), (x_6,y_6) \} \]
\(x_1\) | \(x_2\) | \(x_3\) | \(x_4\) | \(x_5\) | \(x_6\) | |
C | T | A | C | C | – | G |
– | T | A | C | A | T | G |
\(y_1\) | \(y_2\) | \(y_3\) | \(y_4\) | \(y_5\) | \(x_6\) |
Def: \(\mathrm{OPT}(i,j)\) is min cost of aligning prefix strings \(x_1 x_2 \ldots x_i\) and \(y_1 y_2 \ldots y_j\)
Goal: \(\mathrm{OPT}(m,n)\)
(optimal substructure property for each case; proof via exchange argument)
\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]
\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]
Sequence-Alignment(m, n, x1, ..., xm, y1, ..., yn, delta, alpha) For i = 0 to m M[i,0] <- i * delta For j = 0 to n M[0,j] <- j * delta For i = 1 to m For j = 1 to n M[i,j] <- min { alpha(xi, yi) + M[i-1,j-1], // already computed M[i-1,j-1] delta + M[i-1,j], // already computed M[i-1,j] delta + M[i,j-1] // already computed M[i,j-1] } Return M[m,n]
\(\delta = 1, \alpha_{pp} = 0, \alpha_{pq} = 1\)
S | I | M | I | L | A | R | I | T | Y | ||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
I | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
D | 2 | 2 | 2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
E | 3 | 3 | 3 | 3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
N | 4 | 4 | 4 | 4 | 4 | 4 | 5 | 6 | 7 | 8 | 9 |
T | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 6 | 7 | 7 | 8 |
I | 6 | 6 | 5 | 6 | 5 | 6 | 6 | 6 | 6 | 7 | 8 |
T | 7 | 7 | 6 | 6 | 6 | 6 | 7 | 7 | 7 | 6 | 7 |
Y | 8 | 8 | 7 | 7 | 7 | 7 | 7 | 8 | 8 | 7 | 6 |
Theorem: The DP algorithm computes the edit distance (and an optimal alignment) of two strings of length \(m\) and \(n\) in \(\Theta(mn)\) time and space.
Pf:
Theorem [Backurs-Indyk 2015]: If can compute edit distance of two strings of length \(n\) in \(O(n^{2-\epsilon})\) time for some constant \(\epsilon > 0\), then can solve SAT with \(n\) variables and \(m\) clauses in \(\mathrm{poly}(m) 2^{(1-\delta)n}\) time for some constant \(\delta > 0\) (which would disprove SETH)
It is easy to modify the DP algorithm for edit distance to...
Compute edit distance in \(O(mn)\) time and \(O(m+n)\) space
Compute an optimal alignment in \(O(mn)\) time and \(O(m+n)\) space
Both A and B
Neither A nor B
\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]
Theorem [Hirschberg]: There exists an algorithm to find an optimal alignment in \(O(mn)\) time and \(O(m+n)\) space
Edit distance graph:
Edit distance graph:
Pf of Lemma (by strong induction on \(i+j\)):
Pf of Lemma (cont'd):
Edit distance graph
Edit distance graph
Edit distance graph
Edit distance graph
Observation 1: The length of a shortest path that uses \((i,j)\) is \(f(i,j) + g(i,j)\)
Observation 2: Let \(q\) be an index that minimizes \(f(q,n/2) + g(q,n/2)\). Then, there exists a shortest path from \((0,0)\) to \((m,n)\) that uses \((q,n/2)\)
Divide: Find index \(q\) that minimizes \(f(q,n/2) + g(q,n/2)\); save node \((i,j)=(q,n/2)\) as part of solution.
Conquer: Recursively compute optimal alignment in each piece.
Theorem: Hirschberg's algorithm uses \(\Theta(m+n)\) space
Pf:
What is the tightest worst-case running time of Hirschberg's algorithm?
\(O(mn)\)
\(O(mn \log m)\)
\(O(mn \log n)\)
\(O(mn \log m \log n)\)
Theorem: Let \(T(m,n)\) be max running time of Hirschberg's algorithm on strings of lengths at most \(m\) and \(n\). Then, \(T(m,n) = O(mn \log n)\)
Pf:
Remark: Analysis is not tight because two subproblems are of size \((q,n/2)\) and \((m-q,n/2)\). Next, we prove \(T(m,n) = O(mn)\)
Theorem: Let \(T(m,n)\) be max running time of Hirschberg's algorithm on strings of lengths at most \(m\) and \(n\). Then, \(T(m,n) = O(mn)\)
Pf (by strong induction on \(m+n\)):
Pf (cont'd):
Problem: Given two strings \(x_1 x_2 \ldots x_m\) and \(y_1 y_2 \ldots y_n\), find a common subsequence that is as long as possible
Alternative viewpoint: Delete some characters from \(x\); delete some characters from \(y\); a common subsequence if it results in the same string
Ex:
LCS("GGCACCACG", "ACGGCGGATACG") = "GGCAACG" - - G G C - - A C C - A C G A C G G C G G A - - T A C G --------------------------- G G C A A C G
Applications: Unix diff, git, bioinformatics