fullscreen
timer
qrcode
plickers
selector
edit
reset

Dynamic Programming (6)

COS 320 - Algorithm Design

Dynamic Programming (6)

Introduction to Dynamic Programming

algorithmic paradigms

Greedy: build up a solution incrementally, myopically optimizing some local criterion.

Divide-and-Conquer: Break up a problem into independent subproblems, solve each subproblem, and combine solution to subproblems to form solution to original problem.

Dynamic Programming: Break up problem into a series of overlapping subproblems, and build up solutions to larger and larger subproblems.

Note: "Dynamic Programming" is a fancy name for caching away intermediate results in a table for later reuse.

dynamic programming history

Bellman: Pioneered the systematic study of dynamic programming in 1950s

Etymology:

  • Dynamic programming: planning over time
  • Secretary of Defense was hostile to mathematical research
  • Bellman sought an impressive name to avoid confrontation

dynamic programming applications

Areas:

  • bioinformatics
  • control theory
  • information theory
  • operations research
  • computer science: theory, graphics, AI, compilers, systems, ...
  • ...

dynamic programming applications

Some famous dynamic programming algorithms:

  • Unix diff for comparing two files
  • Viterbi for hidden Markov models
  • De Boor for evaluating spline curves
  • Smith-Waterman for genetic sequence alignment
  • Bellman-Ford for shortest path routing in networks
  • Cocke-Kasami-Younger for parsing context-free grammars
  • ...

Dynamic Programming (6)

weighted interval scheduling

weighted interval scheduling

Weighted interval scheduling problem

  • Job \(j\) starts at \(s_j\), finishes at \(f_j\), and has weight or value \(v_j\)
  • Two jobs compatible if they don't overlap
  • Goal: find maximum weight subset of mutually compatible jobs
           a            :   :   :   :   :   :   
:        b      :   :   :   :   :   :   :   :   
:   :   :      c    :   :   :   :   :   :   :   
:   :   :            d          :   :   :   :   
:   :   :   :        e      :   :   :   :   :   
:   :   :   :   :          f        :   :   :   
:   :   :   :   :   :          g        :   :   
:   :   :   :   :   :   :   :        h      :   
0   1   2   3   4   5   6   7   8   9   10  11  

earliest-finish-time first algorithm

Earliest finish-time first:

  • Consider jobs in ascending order of finish time
    • Tie-breaking does not matter
  • Add job to subset if it is compatible with previously chosen jobs

Recall: Greedy algorithm is correct if all weights are 1

Observation: Greedy algorithm fails spectacularly for weighted version

                          b (999)                           :     :     
      a (1)       :     :     :     :     :     :     :     :     :     
:     :     :     :     :     :     :     :           h (1)       :     
0     1     2     3     4     5     6     7     8     9     10    11    

weighted interval scheduling

Notation: Label jobs by finishing time: \(f_1 \leq f_2 \leq \ldots \leq f_n\)

Def: \(p(j)\) is largest index \(i<j\) such that job \(i\) is compatible with \(j\)

Ex: \(p(8) = 5, p(7) = 3, p(2) = 0\)

:        1      :   :   :   :   :   :   :   :   
:   :   :      2    :   :   :   :   :   :   :   
           3            :   :   :   :   :   :   
:   :   :   :        4      :   :   :   :   :   
:   :   :          5        :   :   :   :   :   
:   :   :   :   :          6        :   :   :   
:   :   :   :   :   :          7        :   :   
:   :   :   :   :   :   :   :        8      :   
0   1   2   3   4   5   6   7   8   9   10  11  

dynamic programming: binary choice

Notation: \(\mathrm{OPT}(j)\) is value of optimal solution to the problem consisting of job requests \(1, 2, \ldots, j\)

Case 1\(^*\): \(\mathrm{OPT}\) selects job \(j\)

  • Collect profit \(v_j\)
  • Can't use incompatible jobs \(\{ p(j)+1, p(j)+2, \ldots, j-1 \}\)
  • Must include optimal solution to problem consisting of remaining compatible jobs \(1, 2, \ldots, p(j)\)

Case 2\(^*\): \(\mathrm{OPT}\) does not select job \(j\)

  • Must include optimal solution to problem consisting of remaining compatible jobs \(1, 2, \ldots, j-1\)

\[ \mathrm{OPT}(j) = \begin{cases} 0 & \text{if } j = 0 \\ \max \{ v_j + \mathrm{OPT}(p(j)), \mathrm{OPT}(j-1) \} & \text{otherwise} \end{cases} \]

\(^*\)optimal substructure property (proof via exchange argument)

weighted interval scheduling: brute force

\[ \mathrm{OPT}(j) = \begin{cases} 0 & \text{if } j = 0 \\ \max \{ v_j + \mathrm{OPT}(p(j)), \mathrm{OPT}(j-1) \} & \text{otherwise} \end{cases} \]


Brute-Force(n, s1, ..., sn, f1, ..., fn, v1, ..., vn):
    Sort jobs by finish time so that f[1] <= f[2] <= ... <= f[n]
    Compute p[1], p[2], ..., p[n]
    Return Compute-Opt(n)

Compute-Opt(j):
    If j == 0
        Return 0
    Else
        Return max{
            v[j] + Compute-Opt(p[j]),
            Compute-Opt(j-1)
        }

weighted interval scheduling: brute force

Observation: Recursive algorithm fails spectacularly because of redundant subproblems ⇒ exponential algorithm

Ex: Number of recursive calls for family of "layered" instances grows like Fibonacci sequence

     1      :   :   :   :   :   :   :   :   :   
:   :        2      :   :   :   :   :   :   :   
:   :   :   :        3      :   :   :   :   :   
:   :   :   :   :   :        4      :   :   :   
:   :   :   :   :   :   :   :        5      :   
0   1   2   3   4   5   6   7   8   9   10  11  

\[p(1)=0, p(j)=j-2\] \[\mathrm{OPT}(j) = \max \left\{ \begin{array}{l} v_j + \mathrm{OPT}(p(j)), \\ \mathrm{OPT}(j-1) \end{array} \right\} \]

weighted interval scheduling: memoization

Memoization: Cache results of each subproblem; lookup as needed

Top-Down(n, s1, ..., sn, f1, ..., fn, v1, ..., vn):
    Sort jobs by finish time so that f[1] <= f[2] <= ... <= f[n]
    Compute p[1], p[2], ..., p[n]
    For j = 1 to n
        M[j] <- empty
    M[0] <- 0
    Return M-Compute-Opt(n)

M-Compute-Opt(j):
    If M[j] is empty
        M[j] <- max{
            v[j] + M-Compute-Opt(p[j]),
            M-Compute-Opt(j-1)
        }
    Return M[j]

weighted interval scheduling: running time

Claim: Memoized version of algorithm takes \(O(n \log n)\) time

  • Sort by finish time: \(O(n \log n)\)
  • Computing \(p(\cdot)\): \(O(n \log n)\) via sorting by start time
  • M-Compute-Opt(j)
    • each invocation takes \(O(1)\) time and either
      1. returns an existing value M[j], or
      2. fills in one new entry M[j] and makes two recursive calls
  • Progress measure \(\Phi\) = number of nonempty entries of M[]
    • Initially \(\Phi = 0\), throughout \(\Phi \leq n\)
    • Step 2 above increases \(\Phi\) by \(1\) ⇒ at most \(2n\) recursive calls
  • Overall running time of M-Compute-Opt(n) is \(O(n)\) ∎

Remark: \(O(n)\) if jobs are presorted by finish times

weighted interval scheduling: finding a soln

Q: Dynamic Programming (DP) algorithm computes optimal value. How to find a solution itself?

A: Make a second pass

Find-Solution(j)
    If j = 0
        Return {}
    Else If v[j] + M[p[j]] > M[j-1]
        Return Union({ j }, Find-Solution(p[j]))
    Else
        Return Find-Solution(j-1)

Analysis: number of recursive calls \(\leq n\) ⇒ \(O(n)\)

weighted interval scheduling: bottom-up

Bottom-up dynamic programming: Unwind recursion

Bottom-Up(n, s1, ..., sn, f1, ..., fn, v1, ..., vn):
    Sort jobs by finish time so that f[1]<=f[2]<=...<=f[n]
    Compute p[1], p[2], ..., p[n]
    M[0] <- 0
    For j = 1 to n
        M[j] <- max { v[j] + M[p[j]], M[j-1] }

group: find the weighted interval schedule

Use the dynamic programming solution to solve the weighted interval schedule.

Bottom-Up(n, s1, ..., sn, f1, ..., fn, v1, ..., vn):
    Sort jobs by finish time so that f[1]<=f[2]<=...<=f[n]
    Compute p[1], p[2], ..., p[n]
    M[0] <- 0
    For j = 1 to n
        M[j] <- max { v[j] + M[p[j]], M[j-1] }
:       :       :        a (29) :       :       :       :       :       :       
                     b (29)                     :       :       :       :       
:                c (28)         :       :       :       :       :       :       
:       :        d (68) :       :       :       :       :       :       :       
         e (76)         :       :       :       :       :       :       :       
:       :       :       :       :       :            f (33)     :       :       
:       :       :       :       :                    g (48)             :       
:       :       :       :            h (26)     :       :       :       :       
:       :                    i (3)              :       :       :       :       
:       :       :                        j (71)                 :       :       
1       2       3       4       5       6       7       8       9       0       

Dynamic Programming (6)

segmented least squares

least squares

Least squares: Foundational problem in statistics

  • Given \(n\) points in the plane: \((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)\)
  • Find a line \(y=ax+b\) that minimizes the sum of the squared error:

\[ \mathrm{SSE} = \sum_{i=1}^n (y_i - a x_i - b)^2 \]

Solution: Calculus ⇒ min error is achieved when

\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]

segmented least squares

Segmented least squares:

  • Points lie roughly on a sequence of several line segments
  • Given \(n\) points in plane: \((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)\) with \(x_1 < x_2 < \ldots x_n\), find sequence of lines that minimizes \(f(x)\)

Q: What is a reasonable choice for \(f(x)\) to balance accuracy (goodness of fit) and parsimony (number of lines)?

segmented least squares

Given \(n\) points in the plane: \((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)\) with \(x_1 < x_2 < \ldots < x_n\) and a constant \(c > 0\), find a sequence of lines that minimizes \(f(x) = E + c L\):

  • \(E\) is the sum of the sums of the squared errors in each segment
  • \(L\) is the number of lines

dynamic programming: multiway choice

Notation

  • \(\mathrm{OPT}(j)\) is minimum cost for points \(p_1, p_2, \ldots, p_j\)
  • \(e(i,j)\) is minimum sum of squares for points \(p_i, p_{i+1}, \ldots, p_j\)

To compute \(\mathrm{OPT}(j)\):

  • Last segment uses points \(p_i, p_{i+1}, \ldots, p_j\) for some \(i\)
  • Cost is \(e(i,j) + c + \mathrm{OPT}(i-1)\) (optimal substructure property; proof via exchange argument) \[ \mathrm{OPT}(j) = \begin{cases} 0 & \text{if } j=0 \\ \min\limits_{1 \leq i \leq j} \{ e(i,j) + c + \mathrm{OPT}(i-1) \} & \text{otherwise} \end{cases} \]

segmented least squares algorithm

Segmented-Least-Squares(n, p1, ..., pn, c):
    For j = 1 to n
        For i = 1 to j
            Compute the least squares e(i,j) for the segment pi--pj

    M[0] <- 0
    For j = 1 to n
        Find i in [1,j] that minimizes e(i,j) + c + M[i-1]
        M[j] <- e(i,j) + c + M[i-1]

    Return M[n]

segmented least squares analysis

Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space

\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]

Pf:

  • Bottleneck is computing \(e(i,j)\) for \(O(n^2)\) pairs
  • \(O(n)\) per pair using previous formula ∎

segmented least squares analysis

Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space

\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]

Remark: Can be improved to \(O(n^2)\) time and \(O(n)\) space

  • For each \(i\): precompute cumulative sums
    • \(\sum_{k=1}^i x_k\), \(\sum_{k=1}^i y_k\), \(\sum_{k=1}^i x_k^2\), \(\sum_{k=1}^i x_k y_k\)
  • Using cumulative sums, can compute \(e_{ij}\) in \(O(1)\) time

Dynamic Programming (6)

knapsack problem

knapsack problem

Knapsack problem

  • Given \(n\) objects and a "knapsack"
  • Item \(i\) weights \(w_i > 0\) and has value \(v_i > 0\)
  • Knapsack has capacity of \(W\)
  • Goal: fill knapsack so as to maximize total value

Example: Suppose \(W = 11\) and the weights and values are given in table at right.

Ex: \(\{1,2,5\}\) has value 35

Ex: \(\{3,4\}\) has value 40.

Ex: \(\{3,5\}\) has value 46, but exceeds weight limit.

\(i\) \(v_i\) \(w_i\)
1 1 1
2 6 2
3 18 5
4 22 6
5 28 7

knapsack problem

Greedy by value: Repeatedly add item with maximum \(v_i\)

Greedy by weight: Repeatedly add item with minimum \(w_i\)

Greedy by ratio: Repeatedly add item with maximum ratio \(v_i / w_i\)


Observation: None of greedy algorithms is optimal

dynamic programming: false start

Def: \(\mathrm{OPT}(i)\) is max profit subset of items \(1, \ldots, i\)

Case 1: \(\mathrm{OPT}(i)\) does not select item \(i\)

  • \(\mathrm{OPT}\) selects best of \(\{ 1, 2, \ldots, i-1 \}\) (optimal substructure property; proof via exchange argument)

Case 2: \(\mathrm{OPT}(i)\) selects item \(i\)

  • Selecting item \(i\) does not immediately imply that we will have to reject other items
  • Without knowing what other items were selected before \(i\), we don't even know if we have enough room for \(i\)

Conclusion: Need more subproblems!

dynamic programming: adding a new var

Def: \(\mathrm{OPT}(i, w)\) is max profit subset of items \(1, \ldots, i\) with weight limit \(w\)

Case 1: \(\mathrm{OPT}(i,w)\) does not select item \(i\) (possibly \(w_i > w\)?)

  • Select best of \(\{ 1, 2, \ldots, i-1 \}\) using weight limit \(w\)

Case 2: \(\mathrm{OPT}(i,w)\) selects item \(i\)

  • Collect value \(v_i\)
  • New weight limit: \(w - w_i\)
  • Select best of \(\{1,2,\ldots,i-1\}\) using this new weight limit

\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \{ \mathrm{OPT}(i-1,w), v_i + \mathrm{OPT}(i-1,w-w_i)\} & \text{otherwise} \end{cases} \]

knapsack problem: bottom-up

Knapsack(n, W, w1, ..., wn, v1, ..., vn):
    For w = 0 to W
        M[0, w] <- 0

    For i = 1 to n
        For w = 1 to W
            If wi > w
                M[i, w] <- M[i-1, w]
            Else
                M[i, w] <- Max { M[i-1, w], vi+M[i-1, w-wi] }

    Return M[n, W]

\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \{ \mathrm{OPT}(i-1,w), v_i + \mathrm{OPT}(i-1,w-w_i)\} & \text{otherwise} \end{cases} \]

group: knapsack algorithm

\(i\) \(v_i\) \(w_i\)
1 1 1
2 6 2
3 18 5
4 22 6
5 28 7

\[ \mathrm{OPT}(i,w) = \begin{cases} 0 & \text{if } i=0 \\ \mathrm{OPT}(i-1, w) & \text{if } w_i > w \\ \max \left\{ \begin{array}{l} \mathrm{OPT}(i-1,w), \\ v_i + \mathrm{OPT}(i-1,w-w_i) \end{array} \right\} & \text{otherwise} \end{cases} \]

\(\mathrm{OPT}(i,w) =\) max profit subset of
items \(1, \ldots, i\) with weight limit \(w\)

\(i\) subset 0 1 2 3 4 5 6 7 8 9 10 11
0 \(\{ \}\) 0 0 0 0 0 0 0 0 0 0 0 0
1 \(\{ 1 \}\) 0
2 \(\{ 1,2 \}\) 0
3 \(\{ 1,2,3 \}\) 0
4 \(\{ 1,2,3,4 \}\) 0
5 \(\{ 1,2,3,4,5 \}\) 0

knapsack problem: running time

Running time: There exists an algorithm to solve the knapsack problem with \(n\) items and maximum weight \(W\) in \(\Theta(nW)\) time (weights are integers between \(1\) and \(W\))

  • Not polynomial in input size!
    • \(W\) is an input, but not a size; only a number
    • "Pseudo-polynomial"
  • Solving the maximum version of knapsack problem is NP-Hard, while decision version is NP-Complete (chp 8)
  • There exists a poly-time algorithm that produces a feasible / approximate solution that has value within 0.01% of optimum (see 11.8)
  • See Wikipedia for more details

Dynamic Programming (6)

RNA secondary structure

RNA secondary structure

RNA: String \(B = b_1 b_2 \ldots b_n\) over alphabet \(\{ A, C, G, U \}\)

Secondary structure: RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding the behavior of the molecule.

RNA secondary structure

Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:

  1. Watson-Crick: \(S\) is a matching and each pair in \(S\) is a Watson-Crick complement: \(A{-}U\), \(U{-}A\), \(C{-}G\), \(G{-}C\)
  2. ...




\(S\) is not a secondary structure, because \(C{-}A\) is not a valid Watson-Crick Pair

RNA secondary structure

Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:

  1. Watson-Crick: \(S\) is a matching and each pair in \(S\) is a Watson-Crick complement: \(A{-}U\), \(U{-}A\), \(C{-}G\), \(G{-}C\)
  2. No sharp turns: The ends of each pair are separated by at least 4 intervening bases. If \((b_i,b_j) \in S\), then \(i < j-4\)
  3. ...


\(S\) is not a secondary structure, because \(b_3\) and \(b_7\) are within 4 bases away

RNA secondary structure

Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:

  1. Watson-Crick: \(S\) is a matching and each pair in \(S\) is a Watson-Crick complement: \(A{-}U\), \(U{-}A\), \(C{-}G\), \(G{-}C\)
  2. No sharp turns: The ends of each pair are separated by at least 4 intervening bases. If \((b_i,b_j) \in S\), then \(i < j-4\)
  3. Non-crossing If \((b_i,b_j)\) and \((b_k,b_l)\) are two pairs in \(S\), then we cannot have \(i<k<j<l\)

\(S\) is not a secondary structure, because \(G{-}C\) and \(U{-}A\) cross

RNA secondary structure

Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:

  1. Watson-Crick: \(S\) is a matching and each pair in \(S\) is a Watson-Crick complement: \(A{-}U\), \(U{-}A\), \(C{-}G\), \(G{-}C\)
  2. No sharp turns: The ends of each pair are separated by at least 4 intervening bases. If \((b_i,b_j) \in S\), then \(i < j-4\)
  3. Non-crossing If \((b_i,b_j)\) and \((b_k,b_l)\) are two pairs in \(S\), then we cannot have \(i<k<j<l\)

\(S\) is a secondary structure
(with 3 base pairs)

RNA secondary structure

Secondary Structure: A set of pairs \(S = \{(b_i, b_j)\}\) that satisfy:

  1. Watson-Crick: \(S\) is a matching and each pair in \(S\) is a Watson-Crick complement: \(A{-}U\), \(U{-}A\), \(C{-}G\), \(G{-}C\)
  2. No sharp turns: The ends of each pair are separated by at least 4 intervening bases. If \((b_i,b_j) \in S\), then \(i < j-4\)
  3. Non-crossing If \((b_i,b_j)\) and \((b_k,b_l)\) are two pairs in \(S\), then we cannot have \(i<k<j<l\)

Free-energy hypothesis: RNA molecule will form the secondary structure with the minimum total free energy (approximate by number of base pairs; more base pairs → lower free energy)

Goal: Given an RNA molecule \(B=b_1b_2 \ldots b_n\), find a secondary structure \(S\) that maximizes the number of base pairs.

quiz: dynamic programming

live view ]

Is the following a secondary structure?


  1. Yes
  2. No, because it violates
    Watson-Crick condition
  3. No, because it violates
    no-sharp-turns condition
  4. No, because it violates
    no-crossing condition

quiz: dynamic programming

live view ]

Which subproblems?


  1. \(OPT(j)\) = max number of base pairs in secondary structure of the substring \(b_1 b_2 \ldots b_j\)
  2. \(OPT(j)\) = max number of base pairs in secondary structure of the substring \(b_j b_{j+1} \ldots b_n\)
  3. Either A or B
  4. Neither A nor B

RNA secondary structure: subproblems

First attempt: \(OPT(j)\) = maximum number of base pairs in a secondary structure of the substring \(b_1 b_2 \ldots b_j\)

Goal: \(OPT(n)\)

Choice: Match bases \(b_t\) and \(b_j\)

Difficulty: Results in two subproblems (but one of wrong form)

  • Find secondary structure in \(b_1 b_2 \ldots b_{t-1}\)
    • \(OPT(t-1)\)
  • Find secondary structure in \(b_{t+1} b_{t+2} \ldots b_{j-1}\)
    • need more subproblems (first base no longer \(b_1\))

Dynamic programming over intervals

Def: \(OPT(i,j)\) = maximum number of base pairs in a secondary structure of the substring \(b_i b_{i+1} \ldots b_j\)

  • Case 1: If \(i \geq j - 4\)
    • \(OPT(i,j) = 0\) by no-sharp-turns condition
  • Case 2: Base \(b_j\) is not involved in a pair
    • \(OPT(i,j) = OPT(i,j-1)\)
  • Case 3: Base \(b_j\) pairs with \(b_t\) for some \(i \leq t < j - 4\)
    • Non-crossing condition decouples resulting two subproblems
    • \(OPT(i,j) = 1{+}\max_t \{ OPT(i,t{-}1){+}OPT(t{+}1,j{-}1) \}\)
    • take \(\max\) over \(t\) such that \(i \leq t < j-4\) and \(b_t\) and \(b_j\) are Watson-Crick complements

quiz: dynamic programming

live view ]

In which order to compute \(OPT(i,j)\)?


  1. Increasing \(i\), then increasing \(j\)
  2. Increasing \(j\), then increasing \(i\)
  3. Either A or B
  4. Neither A nor B

bottom-up dynamic programming over intervals

Q: In which order to solve the subproblems?

A: Do shortest intervals first—increasing order of \(|j-i|\)

RNA-Secondary-Structure(n, b1, ..., bn):
  For k = 5 to n - 1
    For i = 1 to n - k
      j <- i + k
      Compute M[i, j] using formula
      // 3 cases
      // = 0         if i >= j-4
      // = M[i,j-1]  if bj not paired
      // = 1 + max_t( M[i,t-1] + M[t+1,j-1] )
      // (all needed vals are already computed)
  Return M[1, n]
\(i\) \ \(j\) ... 6 7 8 9 10 ...
1 ... ...
2 ... 0 ...
3 ... 0 0 ...
4 ... 0 0 0 ...


Theorem: The DP algorithm solves the RNA secondary structure problem in \(O(n^3)\) time and \(O(n^2)\) space.

dynamic programming summary

Outline

  • Define a collection of subproblems (typically, only a polynomial number of subproblems)
  • Solution to original problem can be computed from subprobs
  • Natural ordering of subproblems from "smallest" to "largest" that enables determining a solution to a subproblem from solutions to smaller subproblems

Techniques

  • Binary choice: weighted interval scheduling
  • Multiway choice: segmented least squares
  • Adding a new variable: knapsack problem
  • Intervals: RNA secondary structure

Top-down vs Bottom-up dynamic programming: Opinions differ

Dynamic Programming (6)

Dynamic Programming Example Problems

group: maximum subarray problem

Goal: Given an array of \(n\) integers (positive or negative), find a contiguous subarray whose sum is maximum. \[\begin{array} 12 & 5 & -1 & 31 & -61 & 59 & 26 & -53 & 58 & 97 & -93 & -23 & 84 & -15 & 6 \end{array}\]









Applications: Computer vision, data mining, genomic sequence analysis, technical job interviews, ...

group: maximum rectangle problem

Goal: Given an \(n\)-by-\(n\) matrix, find a rectangle whose sum is maximum.

\[ A = \left[\begin{matrix} -2 & 5 & 0 & -5 & -2 & 2 & -3 \\ 4 & -3 & -1 & 3 & 2 & 1 & -1 \\ -5 & 6 & 3 & -5 & -1 & -4 & -2 \\ -1 & -1 & 3 & -1 & 4 & 1 & 1 \\ 3 & -3 & 2 & 0 & 3 & -3 & -2 \\ -2 & 1 & -2 & 1 & 1 & 3 & -1 \\ 2 & -4 & 0 & 1 & 0 & -3 & -1 \end{matrix}\right] \]

Applications: Computer vision, data mining, genomic sequence analysis, technical job interviews, ...

group: coin changing

Problem: Given \(n\) coin denominations \(\{c_1, c_2, \ldots, c_n\}\) and a target value \(v\), find the fewest coins needed to make change for \(v\) (or report impossible).

Recall: Greedy cashier's algorithm is optimal for U.S. coin denominations, but not for arbitrary coin denominations.

Ex: \(\{1,10,21,34,70,100,350,1295,1500\}\)

Optimal: \(140 = 70 + 70\)

Dynamic Programming (6)

Sequence Alignment

String similarity

Q: How similar are two strings?

Ex: ocurrance and occurrence

edit distance

Edit distance [Levenshtein 1966, Needleman-Wunsch 1970]

  • Define gap penalty \(\delta\) and mismatch penalty \(\alpha_{{p}{q}}\)
  • Cost is minimum sum of gap and mismatch penalties
1 gap, 2 mismatches

\[ \text{cost} = \delta + \alpha_{\text{CG}} + \alpha_{\text{TA}} \]

(assuming: \(\alpha_\text{AA} = \alpha_\text{CC} = \alpha_\text{GG} = \alpha_\text{TT} = 0\))

Applications: Bioinformatics, spell correction, machine translation, speech recognition, information extraction, ...

Spokesperson confirms     senior government adviser was found
Spokesperson said     the senior            adviser was found

blosum matrix for proteins

The BLOSUM (BLOcks SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences.

Wikipedia ]

quiz: dynamic programming

live view ]

What is the edit distance between these two strings?

P A L E T T E     P A L A T E

Assume gap penalty \(\delta = 2\) and mismatch penalty \(\alpha = 1\)

  1. 1
  2. 2
  3. 3
  4. 4

sequence alignment

Goal: given two strings \(x_1 x_2 \ldots x_m\) and \(y_1 y_2 \ldots y_n\), find a min-cost alignment

Def: An alignment \(M\) is a set of ordered pairs \((x_i,y_j)\) such that each character appears in at most one pair and no crossings. The pairs \((x_i,y_j)\) and \((x_{i'}, y_{j'})\) cross if \(i < i'\) but \(j > j'\).

Def: The cost of an alignment \(M\) is:

\[ \mathrm{cost}(M) = \underbrace{\sum_{(x_i,y_j) \in M} \alpha_{x_iy_j}}_{\text{mismatch}} + \underbrace{ \sum_{i:x_i \text{unmatched}} \delta + \sum_{j:y_j \text{unmatched}} \delta}_{\text{gap}} \]

sequence alignment

Alignment of CTACCG and TACATG

\[ M = \{ (x_2,y_1), (x_3,y_2), (x_4,y_3), (x_5,y_4), (x_6,y_6) \} \]

\(x_1\) \(x_2\) \(x_3\) \(x_4\) \(x_5\) \(x_6\)
C T A C C G
T A C A T G
\(y_1\) \(y_2\) \(y_3\) \(y_4\) \(y_5\) \(x_6\)

sequence alignment: problem structure

Def: \(\mathrm{OPT}(i,j)\) is min cost of aligning prefix strings \(x_1 x_2 \ldots x_i\) and \(y_1 y_2 \ldots y_j\)

Goal: \(\mathrm{OPT}(m,n)\)

  • Case 1: \(\mathrm{OPT}(i,j)\) matches \((x_i,y_j)\)
    Pay mismatch for \((x_i,y_j)\) + min cost of aligning \(x_1 x_2 \ldots x_{i-1}\) and \(y_1 y_2 \ldots y_{j-1}\)
  • Case 2a: \(\mathrm{OPT}(i,j)\) leaves \(x_i\) unmatched
    Pay gap for \(x_i\) + min cost of aligning \(x_1 x_2 \ldots x_{i-1}\) and \(y_1 y_2 \ldots y_j\)
  • Case 2b: \(\mathrm{OPT}(i,j)\) leaves \(y_j\) unmatched
    Pay gap for \(y_j\) + min cost of aligning \(x_1 x_2 \ldots x_i\) and \(y_1 y_2 \ldots y_{j-1}\)

(optimal substructure property for each case; proof via exchange argument)

Sequence alignment: Bellman equation

\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]

sequence alignment: bottom-up algorithm

\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]

Sequence-Alignment(m, n, x1, ..., xm, y1, ..., yn, delta, alpha)
    For i = 0 to m
        M[i,0] <- i * delta
    For j = 0 to n
        M[0,j] <- j * delta
    For i = 1 to m
        For j = 1 to n
            M[i,j] <- min {
                alpha(xi, yi) + M[i-1,j-1], // already computed M[i-1,j-1]
                delta + M[i-1,j],           // already computed M[i-1,j]
                delta + M[i,j-1]            // already computed M[i,j-1]
            }
    Return M[m,n]

sequence alignment: traceback

\(\delta = 1, \alpha_{pp} = 0, \alpha_{pq} = 1\)

S I M I L A R I T Y
0 1 2 3 4 5 6 7 8 9 10
I 1 1 1 2 3 4 5 6 7 8 9
D 2 2 2 2 3 4 5 6 7 8 9
E 3 3 3 3 3 4 5 6 7 8 9
N 4 4 4 4 4 4 5 6 7 8 9
T 5 5 5 5 5 5 5 6 7 7 8
I 6 6 5 6 5 6 6 6 6 7 8
T 7 7 6 6 6 6 7 7 7 6 7
Y 8 8 7 7 7 7 7 8 8 7 6

4 mismatches, 2 gap

sequence alignment: analysis

Theorem: The DP algorithm computes the edit distance (and an optimal alignment) of two strings of length \(m\) and \(n\) in \(\Theta(mn)\) time and space.

Pf:

  • Algorithm computes edit distance
  • Can trace back to extract optimal alignment itself. ∎


Theorem [Backurs-Indyk 2015]: If can compute edit distance of two strings of length \(n\) in \(O(n^{2-\epsilon})\) time for some constant \(\epsilon > 0\), then can solve SAT with \(n\) variables and \(m\) clauses in \(\mathrm{poly}(m) 2^{(1-\delta)n}\) time for some constant \(\delta > 0\) (which would disprove SETH)

quiz: dynamic programming

live view ]

It is easy to modify the DP algorithm for edit distance to...

  1. Compute edit distance in \(O(mn)\) time and \(O(m+n)\) space

  2. Compute an optimal alignment in \(O(mn)\) time and \(O(m+n)\) space

  3. Both A and B

  4. Neither A nor B

\[ \mathrm{OPT}(i,j) = \begin{cases} j\delta & \text{if } i = 0 \\ i\delta & \text{if } j = 0 \\ \min \left\{\begin{array}{lll} \alpha_{x_iy_j} & + & \mathrm{OPT}(i-1,j-1) \\ \delta & + & \mathrm{OPT}(i-1, j) \\ \delta & + & \mathrm{OPT}(i, j-1) \end{array}\right. & \text{otherwise} \end{cases} \]

Dynamic Programming 2 (6)

Hirschberg's algorithm

sequence alignment in linear space

Theorem [Hirschberg]: There exists an algorithm to find an optimal alignment in \(O(mn)\) time and \(O(m+n)\) space

  • Clever combination of divide-and-conquer and dynamic programming
  • Inspired by idea of Savitch from complexity theory

hirschberg's algorithm

Edit distance graph:

  • Let \(f(i,j)\) denote length of shortest path from \((0,0)\) to \((i,j)\)
  • Lemma: \(f(i,j) = \mathrm{OPT}(i,j)\) for all \(i\) and \(j\)

hirschberg's algorithm

Edit distance graph:

  • Let \(f(i,j)\) denote length of shortest path from \((0,0)\) to \((i,j)\)
  • Lemma: \(f(i,j) = \mathrm{OPT}(i,j)\) for all \(i\) and \(j\)

Pf of Lemma (by strong induction on \(i+j\)):

  • Base case: \(f(0,0) = \mathrm{OPT}(0,0) = 0\)
  • Inductive hypothesis: assume \(f(i',j')= \mathrm{OPT}{i',j'}\) is true for all \((i',j')\) with \(i'+j' < i+j\)
  • Last edge on shortest path to \((i,j)\) is from \((i-1,j-1)\), \((i-1,j)\), or \((i,j-1)\)
  • Thus...

hirschberg's algorithm

Pf of Lemma (cont'd):

  • ...
  • (assume \(f(i',j') = \mathrm{OPT}(i',j')\) is true for all \((i',j')\) with \(i'+j'<i+j\))
  • Thus, \[\begin{eqnarray} f(i,j) & = & \min \left\{ \begin{array} {a} \alpha_{x_iy_j} + f(i-1,j-1), \\ \delta + f(i-1,j), \\ \delta + f(i,j-1) \end{array} \right\} & \text{} \\ & = & \min \left\{ \begin{array} {a} \alpha_{x_iy_j} + \mathrm{OPT}(i-1,j-1), \\ \delta + \mathrm{OPT}(i-1,j), \\ \delta + \mathrm{OPT}(i,j-1) \\ \end{array} \right\} & \qquad\text{inductive hypothesis} \\ & = & \mathrm{OPT}(i,j) \quad\blacksquare & \qquad\text{Bellman equation} \end{eqnarray}\]

hirschberg's algorithm

Edit distance graph

  • Let \(f(i,j)\) denote length of shortest path from \((0,0)\) to \((i,j)\)
  • Lemma: \(f(i,j) = \mathrm{OPT}(i,j)\) for all \(i\) and \(j\)
  • Can compute \(f(\cdot,j)\) for any \(j\) in \(O(mn)\) time and \(O(m+n)\) space

hirschberg's algorithm

Edit distance graph

  • Let \(g(i,j)\) denote length of shortest path from \((i,j)\) to \((m,n)\)

 

hirschberg's algorithm

Edit distance graph

  • Let \(g(i,j)\) denote length of shortest path from \((i,j)\) to \((m,n)\)
  • Can compute \(g(i,j)\) by reversing the edge orientations and inverting the roles of \((0,0)\) and \((m,n)\)

hirschberg's algorithm

Edit distance graph

  • Let \(g(i,j)\) denote length of shortest path from \((i,j)\) to \((m,n)\)
  • Can compute \(g(\cdot,j)\) for any \(j\) in \(O(mn)\) time and \(O(m+n)\) space

hirschberg's algorithm

Observation 1: The length of a shortest path that uses \((i,j)\) is \(f(i,j) + g(i,j)\)

hirschberg's algorithm

Observation 2: Let \(q\) be an index that minimizes \(f(q,n/2) + g(q,n/2)\). Then, there exists a shortest path from \((0,0)\) to \((m,n)\) that uses \((q,n/2)\)

hirschberg's algorithm

Divide: Find index \(q\) that minimizes \(f(q,n/2) + g(q,n/2)\); save node \((i,j)=(q,n/2)\) as part of solution.

Conquer: Recursively compute optimal alignment in each piece.

Hirschberg's algorithm: space analysis

Theorem: Hirschberg's algorithm uses \(\Theta(m+n)\) space

Pf:

  • Each recursive call uses \(\Theta(m)\) space to computer \(f(\cdot,n/2)\) and \(g(\cdot,n/2)\)
  • Only \(\Theta(1)\) space needs to be maintained per recursive call
  • Number of recursive calls \(\leq n\). ∎

quiz: dynamic programming

live view ]

What is the tightest worst-case running time of Hirschberg's algorithm?

  1. \(O(mn)\)

  2. \(O(mn \log m)\)

  3. \(O(mn \log n)\)

  4. \(O(mn \log m \log n)\)

hirschberg's alg: runtime analysis warmup

Theorem: Let \(T(m,n)\) be max running time of Hirschberg's algorithm on strings of lengths at most \(m\) and \(n\). Then, \(T(m,n) = O(mn \log n)\)

Pf:

  • \(T(m,n)\) is monotone nondecreasing in both \(m\) and \(n\)
  • \(T(m,n) \leq 2 T(m,n/2) + O(mn)\)
    \(\Rightarrow T(m,n) = O(mn \log n)\)

Remark: Analysis is not tight because two subproblems are of size \((q,n/2)\) and \((m-q,n/2)\). Next, we prove \(T(m,n) = O(mn)\)

hirschberg's alg: runtime analysis

Theorem: Let \(T(m,n)\) be max running time of Hirschberg's algorithm on strings of lengths at most \(m\) and \(n\). Then, \(T(m,n) = O(mn)\)

Pf (by strong induction on \(m+n\)):

  • \(O(mn)\) time to compute \(f(\cdot,n/2)\) and \(g(\cdot,n/2)\) and find index \(q\)
  • \(T(q,n/2) + T(m-q,n/2)\) time for two recursive calls
  • Choose constant \(c\) so that:
    • \(T(m,2) \leq cm\),
    • \(T(2,n) \leq cn\),
    • \(T(m,n) \leq cmn + T(q,n/2) + T(m-q,n/2)\)
  • Claim: \(T(m,n) \leq 2cmn\)
  • ...

hirschberg's alg: runtime analysis

Pf (cont'd):

  • Claim: \(T(m,n) \leq 2cmn\)
  • Base cases: \(m=2\) and \(n=2\)
  • Inductive Hypothesis: \(T(m,n) \leq 2cmn\) for all \((m',n')\) with \(m'+n' < m+n\) \[\begin{eqnarray} T(m,n) & \leq & T(q,n/2) + T(m-q,n/2) + cmn \\ & \leq & 2cqn/2 + 2c(m-q)n/2 + cmn \\ & = & cqn + cmn - cqn + cmn \\ & = & 2cmn \quad\blacksquare \end{eqnarray}\]

group: longest common subsequence

Problem: Given two strings \(x_1 x_2 \ldots x_m\) and \(y_1 y_2 \ldots y_n\), find a common subsequence that is as long as possible

Alternative viewpoint: Delete some characters from \(x\); delete some characters from \(y\); a common subsequence if it results in the same string

Ex:

LCS("GGCACCACG", "ACGGCGGATACG") = "GGCAACG"

- - G G C - - A C C - A C G
A C G G C G G A - - T A C G
---------------------------
    G G C     A       A C G

Applications: Unix diff, git, bioinformatics

×