Project 05: kD-Trees
Small Group, Due: 2025 Nov 18, 11:59pm
In this assignment, you will implement a practical and efficient algorithm to find the nearest neighbors in a 2D plane.
Note
Not only should your implementation work correctly, but it will need to use space and time efficiently to receive full credit.
This assignment is based on an assignment by Kevin Wayne, an author of the Algorithms 4/e book.
You may work in a small group of no more than 3 people.
The Nearest Neighbor Problem
Create a symbol table data type whose keys are two-dimensional points and values are of a generic type. Use a 2d tree to support efficient nearest neighbor search (find a closest point to a query point). 2d trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.
Geometric Primitives
Use the immutable data type Point for points in the plane.
Here is a subset of its API that you may use.

The Point class contains several useful static functions and methods for working with points.
Below is a brief summary list.
- Generate random points in the unit square (\((x,y) \in [0,1]^2\)) with
uniformandgaussian. Theuniformfunction chooses the point uniformly at random. Thegaussianfunction chooses the point with a gaussian distribution with mean of \(0.5\) and standard deviation of \(0.12\) (any points outside unit square are discarded and a new point is chosen). - Compute distance between two points with
dist. ThedistSquaredmethod computes the squared distance. PointisComparableand it contains several functions/methods for generating comparators to comparePointobjects in different ways.xyComparator(): compares x first, then yyxComparator(): compares y first, then xpolarRadiusComparator(): compares polar radius (dist from \((0,0)\))distanceToComparator(): compares dist betweenthisand anotherPointPoint.compareTo(): compares y first, then x (same asyxComparator())
Important
Although you may use any of these comparison methods, you should NOT trust that it will do what you expect it to do for this project.

The immutable data type PointDist stores a Point and a distance (double).
The PointDist class is Comparable based on the distance.
There are several constructors for creating a new PointDist.
The immutable data type Partition stores two points and a partitioning direction (Compare).
The Compare is either X or Y, indicating it is partitioning based on x values or y values, respectively.
- X-partitions divide spaces into left and right half-spaces (compares based on x).
- Y-partitions divide spaces into down and up half-spaces (compares based on y).
You may not modify these data types.
PointSearch Interface
You will write two implementations to solve the nearest neighbor problem.
Both implementations will implement the generic PointSearch interface, which is a symbol table with Point as key and the generic type Value as value.
The API is shown below.

Note
We are NOT implementing a method for removing a point from the data structure, because this is a much harder problem to solve.
Important
The min() and max() functions do not return the "smallest" and "largest" points in the collection, but instead return a Point that has the smallest x and the smallest y or largest x and largest y.
In other words, min() returns the bottom-left corner of bounding box containing all the points, and max() returns the top-right corner.
PSBruteForce Implementation

Write a mutable data type PSBruteForce that implements the PointSearch interface using a red-black BST.
Use the RedBlackBST from algs4.jar; do not implement your own red-black BST!
For details on the red-black BST implementation, see the RedBlackBST documentation.
Important notes:
- You do not need to understand how a red-black BST works in order to use it. In fact, you can use any BST for the brute force implementation, but a red-black BST will guarantee log performance on
put(),get(), andcontains(). - The red-black BST does not have "partitions", so your implementation can return either an empty
Iterableornull. - The red-black BST cannot solve the nearest problems (
getNearest()andnearest()). You will need to create a brute force solution for these features.
PSKDTree Implementation

Write a mutable data type PSKDTree that uses a 2d tree to implement the same API.
A 2d tree is a generalization of a BST to two-dimensional keys.
The idea is to build a BST with points in the nodes, using the \(x\)- and \(y\)-coordinates of the points as keys in strictly alternating sequence, starting with the \(x\)-coordinates.
Search and Insert. The algorithms for search and insert are similar to those for BSTs, but at the root we use the \(x\)-coordinate (if the point to be inserted has a smaller \(x\)-coordinate than the point at the root, go left; otherwise go right); then at the next level, we use the \(y\)-coordinate (if the point to be inserted has a smaller \(y\)-coordinate than the point in the node, go left; otherwise go right); then at the next level the \(x\)-coordinate, and so forth.
PointSearch<Character> ps = new PSKDTree<>(); ps.put(new Point(0.8,0.1), 'A'); ps.put(new Point(0.5,0.5), 'B'); ps.put(new Point(0.2,0.4), 'C'); ps.put(new Point(0.3,0.9), 'D'); ps.put(new Point(0.9,0.6), 'E');






The primary advantage of a 2d tree over a BST is that it supports efficient implementation of nearest neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its subtree. The root corresponds to the infinitely large square from \((-\infty,+\infty)^2\); the left and right children of the root correspond to the two rectangles split by the \(x\)-coordinate of the point at the root; and so forth.
Partitions.
The partitions() method returns the partitions used to divide up the 2d space.
You are not required to implement this function, but it will count as extra credit and help assist in debugging.
Nearest Neighbor Search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, you should search a node only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize your recursive method so that when there are two possible subtrees to go down, you choose first the subtree that is on the same side of the partitioning line and the query point; the closest point found while exploring the first subtree may enable pruning of the second subtree.
Implementation Details and Requirements
\(k\)-Nearest Neighbor Search.
To find the \(k\) closest points, use a maximum priority queue (MaxPQ) of PointDist.
As you find a point that is closer to the query point than the maximum on the priority queue, add that point to the PQ.
When the number of points no the PQ is greater than \(k\), delete the maximum.
Edge Cases.
get() should return null if the given Point is not in the collection.
More generally, if the collection is empty, all Point-related functions (min(), get(), etc.) should return null, while isEmpty() and size() should return true and 0.
Throw a java.lang.NullPointerException if any Point argument is null.
Runtimes. Your implementation should have the following runtimes.
PSBruteForce |
PSKDTree |
PSKDTree |
|
|---|---|---|---|
isEmpty |
\(O(1)\) | \(O(1)\) | \(O(1)\) |
size |
\(O(1)\) | \(O(1)\) | \(O(1)\) |
min |
\(O(1)\) | \(O(1)\) | \(O(1)\) |
max |
\(O(1)\) | \(O(1)\) | \(O(1)\) |
put |
\(O(\lg N)\) | \(O(\lg N)\) | \(O(N)\) |
points |
\(O(1)\) | \(O(1)\) | \(O(1)\) |
contains |
\(O(\lg N)\) | \(O(\lg N)\) | \(O(N)\) |
get |
\(O(\lg N)\) | \(O(\lg N)\) | \(O(N)\) |
nearest |
\(O(N)\) | \(O(\lg N)\) | \(O(N)\) |
getNearest |
\(O(N)\) | \(O(\lg N)\) | \(O(N)\) |
k-nearest |
\(O(N \lg k)\) | \(O(\lg N \lg k)\) | \(O(N \lg k)\) |
| worst | average | worst |
Notes:
- The
keys()function inRedBlackTreewill take time proportional to the number of points in the symbol table (\(O(N)\)), so you will need to do "extra work" in theput()method to meet the requirement above. - You can guarantee \(O(\lg N)\) runtime on
containsandgetinPSKDTreeby using aRedBlackTreealong with a 2D-Tree, however it would basically double your memory usage andputwould still take \(O(N)\) in worst case.
Hint
The Point class has a xy method that returns either the x or y value based on the passed Partition.Direction.
This can simplify your code considerably.
Clients
You may use the following two interactive client programs to test and debug your code.

Nearest Neighber Client.
The NearestNeighborVisualizer will insert random points into the PointSearch symbol table, then visualize the points as black boxes in a window.
As you move your mouse over the window, light blue lines are drawn to the \(k\) nearest neighbors to the location under the mouse cursor, and a dark blue line is drawn to the closest point.
Clicking with the mouse will insert a new point under the mouse cursor.
Pressing c will create a new symbol table (effectively clearing out all point and value data).
Pressing the up and down arrow keys will increase and decrease \(k\), while the n key toggles drawing the closest point.
Pressing the p and b toggles drawing the partitions (if applicable) and bounding box.
X-partitions are drawn in red; Y-partitions are drawn in green.
The bounding box is drawn in gray.
Fast Food Visualizer Client.
The FastFoodVisualizer loads, inserts, and visualizes 7084 Burger King locations in the United States.
As you move your mouse, the client will find the nearest BK restaurant to the point under the cursor, draw a line to it, and print out its information at the top.
The data file was retrieved from the Algorithms 4e book website.
If you want to test with other data, you may download and test the visualizer with other location files, such as McDonald's, Starbucks, and Walmart.
The client code uses other helper classes contained in the project.
The KeyPress class helps determine when a key is pressed (not just held down).
Mouse helps determine when a button is pressed (not just held down) and in getting the cursor position within an area in the window.
Keyboard Shortcuts
Listed below are some useful keyboard shortcuts for the NearestNeighborVisualizer.
| Key | Action |
|---|---|
C |
Clear all points |
LeftMouse |
Insert point at mouse position |
UpArrow |
Increase \(k\) for k-nearest neighbors |
DownArrow |
Decrease \(k\) for k-nearest neighbors |
N |
Toggle drawing line from mouse to closest point |
P |
Toggle drawing space partitioning lines |
B |
Toggle drawing bounding box |
Requirements and Submission
You will need to implement and analyze PSBruteForce and PSKDTree classes.
Analysis
Give the total memory usage in bytes using tilde notation of your 2d tree data structure as a function of the number of points \(N\), using the memory-cost model from lecture and Section 1.4 of the textbook. Count all memory that is used by your 2d tree, including memory for the nodes, points, partitions, etc.
How many nearest neighbor calculations can your 2d tree implementation perform per second for input100K.txt (100 thousand points) and input1M.txt (1 million points), where the query points are uniformly random points in the unit square (Point.uniform())? (Do not count the time to read or to build the 2d tree).
Repeat the question but with the brute-force implementation.
Important
The visualizers are extremely slow when compared to the insert and nearest methods, even with PSBruteForce.
When comparing the two implementations, be sure to measure them not using the visualizers!
Here is some sample starter code.
// place your timing code or unit testing here
public static void main(String[] args) {
// load point data into flat array
In in = new In("input100k.txt");
double[] d = in.readAllDoubles();
// insert points into PointSearch
PointSearch<Integer> ps = new PSKDTree<>();
for(int i = 0; i < d.length; i+=2) {
ps.put(new Point(d[i], d[i+1]), i);
}
// call nearest 1 million queries! (pinky to corner of mouth)
Stopwatch sw = new Stopwatch();
for(int i = 0; i < 1_000_000; i++) {
Point q = Point.gaussian();
ps.nearest(q);
}
double time = sw.elapsedTime();
double queries_per_sec = 1_000_000 / time;
}
Readme
Do all of the problems above in the appropriate Java file, and then write about your findings in readme.md.
Show your work and be thoughtful.
Your job is to convince me that you have done the work and that you have either learned something new or verified an already understood concept.
Include screenshots!
Files
The diagram below indicates which files you can edit, which files/folders you should not edit, and the rest are files/folders that might be changed by IntelliJ or git but you should not change directly.
P05_KDTrees/... ├─ .idea/... ├─ .git/... <- **DO NOT EDIT** ├─ .gitignore <- **DO NOT EDIT** ├─ .gitmodules <- **DO NOT EDIT** ├─ out/... ├─ libs/... ├─ src/... │ ├─ PSBruteForce.java <- Java file that you might need to modify │ ├─ PSKDTree.java <- Java file that you might need to modify │ │ The following files should not need edited │ ├─ PointSearch.java │ ├─ Point.java │ ├─ PointDist.java │ ├─ Partition.java │ ├─ Visualizer.java │ ├─ FastFoodVisualizer.java │ └─ NearestNeighborVisualizer.java ├─ data/... <- Additional test input files you create ├─ data_ref/.. <- **DO NOT EDIT** └─ readme.md <- Fill this out before submission!
We will supply an IntelliJ project with stdlib.jar and algs4.jar libraries.
These libraries contain many useful classes.
On this assignment, the only library functions you may call are those in java.util, stdlib.jar, and algs4.jar.
Do not import any other library!
You may create additional test files under data/ folder.
Submission
Once you have completed the assignment requirements, simply stage, commit, and push your changes to the CSE GitLab, and then close the issues.
Do NOT submit anything to the course Brightspace page.
Grading Rubric
| Implementation | |
| 3pts | Correct implementation of PSBruteForce methods |
| 3pts | Efficient PSBruteForce methods |
| 3pts | Correct implementation of PSKDTree methods |
| 3pts | Efficient PSKDTree methods |
| Readme | |
| 3pts | Completed readme.html |
| 3pts | Runtime analysis of PSBruteForce and PSKDTree |
| 3pts | Memory analysis of PSKDTree |
| Extra Credit | |
| +3pts | Extra Credit: partitions() returns reasonable results |