Least squares: Foundational problem in statistics
\[ \mathrm{SSE} = \sum_{i=1}^n (y_i - a x_i - b)^2 \] |
![]() |
Solution: Calculus ⇒ min error is achieved when
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Segmented least squares:
Q: What is a reasonable choice for \(f(x)\) to balance accuracy (goodness of fit) and parsimony (number of lines)?
Given \(n\) points in the plane: \((x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)\) with \(x_1 < x_2 < \ldots < x_n\) and a constant \(c > 0\), find a sequence of lines that minimizes \(f(x) = E + c L\):
Notation
To compute \(\mathrm{OPT}(j)\):
Segmented-Least-Squares(n, p1, ..., pn, c): For j = 1 to n For i = 1 to j Compute the least squares e(i,j) for the segment pi--pj M[0] <- 0 For j = 1 to n Find i in [1,j] that minimizes e(i,j) + c + M[i-1] M[j] <- e(i,j) + c + M[i-1] Return M[n]
Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Pf:
Theorem: The dynamic programming algorithm solves the segmented least squares problem in \(O(n^3)\) time and \(O(n^2)\) space
\[ a = \frac{n \sum_i x_i y_i - (\sum_i x_i) (\sum_i y_i)}{n \sum_i x_i^2 - (\sum_i x_i)^2}, b = \frac{\sum_i y_i - a \sum_i x_i}{n} \]
Remark: Can be improved to \(O(n^2)\) time and \(O(n)\) space