dyslexic
edit
qrcode
+=-

Project 06: Hollywood Center


Small Group, Due: 2024 Dec 6, 5:00pm

In this assignment, you will implement an algorithm to assist in determining the "Center of the Hollywood Universe".

NOTE: Not only should your implementation work correctly, but it will need to use space and time efficiently to receive full credit.

This assignment is based on web exercises from the Algorithms 4/e book website. The definitions are from Six Degrees of Kevin Bacon Wikipedia article.

You may work in a small group of no more than 3 people. However, each member must write their own readme.html and submit it along with the entire project.

The Hollywood Number and the Center of the Hollywood Universe


Six Degrees of Kevin Bacon and the Bacon Number: A popular parlor game among movie buffs is to challenge each other to find the shortest path of connections between an arbitrary actor and the prolific Hollywood character actor Kevin Bacon. A connection exists between two actors if both actors appear in the same film. This game is based on the concept of "six degrees of separation", which posits that any two people on Earth are six or fewer acquaintance links apart.

The Bacon Number of an actor is the number of degrees of separation he or she has from Kevin Bacon as defined by the game. The higher the Bacon number, the farther away from Kevin Bacon the actor is. The Bacon Number is similar to the Erdős Number.

Kevin Bacon himself has a Bacon Number of 0. Actors who have worked directly with Kevin Bacon have a Bacon Number of 1. If the lowest Bacon Number of any actor with whom an arbitrary actor X has appeared in any movie is \(N\), then X's Bacon Number is \(N+1\). For example, Elvis Presley has a Bacon Number of 2, because Presley never appeared in a film with Bacon but was in Change of Habit (1969) with Edward Asner who was in JFK (1991) with Kevin Bacon. Note: some actors have a Bacon Number of \(\infty\), because there does not exist a movie that connects the actor and Kevin Bacon. For example, Beatrice Crumb (IMDB) has a Bacon Number of \(\infty\) according to The Oracle of Bacon as of Nov 2018 (although since then, Beatrice Crumb has been removed from the Oracle of Bacon database).

Hollywood Number and the Center of the Hollywood Universe: We can measure how good of a center that Kevin Bacon is by computing their Hollywood Number. The Hollywood Number of Kevin Bacon is the average Bacon Number of all the actors. The Hollywood Number of another actor is computed the same way, but with the other actor the source instead of Kevin Bacon.

The Center of the Hollywood Universe would be the set of actors that have the minimum average distance to all other actors in the universe.

The Hollywood Center Problem: Determine if Kevin Bacon is at the Center of the Hollywood Universe by first computing Kevin Bacon's Hollywood Number, and then see if you can find an actor with better Hollywood Number.

Modeling the Problem


A natural way to model this problem is to turn the Hollywood information into a undirected graph, where the nodes of the graph represent the actors of Hollywood, and edges connect two actor nodes if the actors appeared together in the same film. Under this graph model, we can compute the distance between two actors by finding the minimum number of edges along the path that connects the two. Specifically, we compute the minimum path length by performing a Breadth-First Search (BFS). If there does not exist a path between two actors (distance of \(\infty\)), they would belong to separate connected components.

Implement the HollywoodGraph class. The constructor will load the data stored in the the specified text file (format detailed below). The connectedComponents() and connectedComponentsCount() functions returns an Iterable of representative actor names for each connected component in the graph and the count of connected components. A representative actor for a connected component is any actor that is part of the connected component. The connectedActorsCount() function returns the number of actors that are connected to the given actor, including the given actor. The hollywoodNumber() function returns the Hollywood Number of the specified actor. The getActorDetails() function returns an object that contains detailed information about the given actor. To do this, you will need to create a class that implements the HollywoodActor interface.

The HollywoodActor implementation should perform a breadth-first search starting from a given actor. The name() function will return the name of the given actor, and movies() will return an iterable of movies the actor played in. The distanceAverage(), distanceMaximum(), and actorMaximum() functions should return the average distance to all actors (counting only the connecting movies) connected to the given actor, the maximum distance to a connected actor (counting only the connecting movies), and the name of the actor with the maximum distance, respectively. The actorPath() and actorPathLength() functions take another actor's name as input and returns the shortest path connecting the two actors and that path's length, respectively. If the two actors are not connected, actorPathLength() should return infinity (Double.POSITIVE_INFINITY;). The moviePath() function is extra credit (see below).

Data Structures and Algorithms.
You may use the Graph, BreadthFirstPaths, and SymbolGraph classes provided by algs4.jar, or you can implement your own graph class and processing functions. Use the CCBFS class available in the framework rather than CC in the algs4.jar, since CC uses recursion and could hit the recursion depth limit set by Java. You may not use any source code from the Internet (including Stack Overflow). See the book's website for more detailed information on undirected graphs at Algorithms 4e Undirected Graphs (4.1).

Hint: You may need to create helper functions and classes to help you solve this problem. All of the additional classes should be private inner classes to HollywoodGraph. Use the structures of the previous assignments as a guide, and also take a look at the SymbolGraph implementation (SymbolGraph implementation).

Performance Requirements.
Your implementations should be efficient to receive full credit. For HollywoodGraph: the connectedComponentsCount() and connectedActorsCount() functions should return in log time or better; connectedComponents() should run in time proportional to the number of connected components or better; getActorDetails() and hollywoodNumber() should run in time proportional to \(V+E\) (the number of nodes and edges in graph).

For HollywoodActor: name(), distanceAverage(), distanceMaximum(), actorMaximum() should return in constant time; movies() should run it time proportional to the number of movies the actor played in; actorPath(), actorPathLength(), and moviePath() should run in time proportional to the number of actors and movies along the path.

Edge Cases.
Throw a NoSuchElementException if the actor name specified is null or does not exist in the data file. If two actors are not connected, actorPath() and moviePath() should return null (not an empty Iterable) and actorPathLength() should return infinity (Double.POSITIVE_INFINITY definition). If a movie is passed to a function when an actor is expected, throw a NoSuchElementException.

movies.txt File Format


The file movies.txt contains a small subset of movies and actors (1757 movies, 59168 actors) in a simple format. A line in the file (\n separated) contains a single movie and set of actors that played in the movie (/ separated). Use the In class and the readAllLines() function to read in string array of lines from the file. Then, for each line, split the text by the / character to read the movie (first part) and each of the actors (all remaining parts). Below is sample code.

String[] lines = (new In("movies.txt")).readAllLines();
for(String line : lines) {
    String[] parts = line.split("/");
    String movie = null;
    for(String part : parts) {
        if(movie == null) {
            movie = part;
        } else {
            String actor = part;
            /* do stuff with actor */
        }
    }
}

Example


Below is the contents of the example file, example.txt.

Movie 1/Actor A/Actor B/Actor C
Movie 2/Actor C/Actor D
Movie 3/Actor E/Actor F
Movie 4/Actor E/Actor F
Movie 5/Actor F/Actor G

HINT: use the included example.txt when you begin your project and test your implementation, so that you can step through the debugger and actually reason about the results.

The example above can be interpreted as the bipartite graph below.

If we collapse the movie nodes and simply connect each actor to the other actors who were in the same movie together, we would produce the graph below. Each of the edges is labeled with a movie that both actors appeared in. Note: some actors appeared in multiple movies together, but we only need one.

According to the previous graph, the pairwise, average, and maximum distances and maximum actor for each actor are shown in the table below.

A B C D E F G
Actor A \(0\) \(1\) \(1\) \(2\) \(\infty\) \(\infty\) \(\infty\)
Actor B \(1\) \(0\) \(1\) \(2\) \(\infty\) \(\infty\) \(\infty\)
Actor C \(1\) \(1\) \(0\) \(1\) \(\infty\) \(\infty\) \(\infty\)
Actor D \(2\) \(2\) \(1\) \(0\) \(\infty\) \(\infty\) \(\infty\)
Actor E \(\infty\) \(\infty\) \(\infty\) \(\infty\) \(0\) \(1\) \(2\)
Actor F \(\infty\) \(\infty\) \(\infty\) \(\infty\) \(1\) \(0\) \(1\)
Actor G \(\infty\) \(\infty\) \(\infty\) \(\infty\) \(2\) \(1\) \(0\)








Avg Dist \(1.0\) \(1.0\) \(0.75\) \(1.25\) \(1.0\) \(0.67\) \(1.0\)
Max Dist \(2\) \(2\) \(1\) \(2\) \(2\) \(1\) \(2\)
Max Actor D D A B G E E

An actor path from A to D is ACD. A movie path from E to G is EM3FM5G.

Notes:

Here are some possible results:

HollywoodGraph g = new HollywoodGraph("example.txt");
g.connectedComponents();            // ["Actor A", "Actor F"]
g.connectedComponentsCount();       // 2
g.connectedActorsCount("Actor A");  // 4
g.hollywoodNumber("Actor A");       // 1.0  (avg. dist)

HollywoodActor a = g.getActorDetails("Actor A");
a.name();                      // "Actor A"
a.movies();                    // ["Movie 1"]
a.distanceAverage();           // 1.0
a.distanceMaximum();           // 2.0
a.actorPathLength("Actor D");  // 2.0
a.actorPath("Actor D");        // ["Actor A", "Actor C", "Actor D"]
a.actorPathLength("Actor G");  // Double.POSITIVE_INFINITY
a.actorPath("Actor G");        // null  (edge case)

// extra credit
a.moviePath("Actor D");
// ["Actor A", "Movie 1", "Actor C", "Movie 2", "Actor D"]

Extra Credit


You can receive extra credit by providing an efficient implementation for the moviePath() function (and actorPath()). This function will return a path between the two actors as an iterable of strings similar to path(), but alternating the actor and connecting movie. In other words, the oddly indexed strings will be the movie that connects the actors on either side (evenly indexed strings). For example, Harrison Ford has a Bacon Number of 2, so a possible path between Harrison Ford and Kevin Bacon would be Ford, Harrison (I), Air Force One (1997), Berkeley, Xander, Few Good Men, A (1992), Bacon, Kevin.

One way to implement this is to modify your implementation to have a node for every actor and for every movie, and create an edge between an actor node and movie node if the actor played in the movie. Modeling the Hollywood data this way would provide a simple and efficient way to find out how many actors played in a movie or how many movies an actor played in (degree of the node), however it slightly complicates the actorPath and actorPathLength implementations.

Additional Data Sets


The book site has additional data sets that you can download and test.

Deliverables


Submit a zip file containing the entire project IntelliJ project, including all of the files listed below. The files you are required to modify are marked with <<<<<.

P06_HollywoodCenter/
    .idea/...
    .log/...
    tests/
        movies.txt
    src/
        algs4.jar
        stdlib.jar
        CCBFS.java
        HollywoodActor.java
        HollywoodGraph.java                     <<<<<
    .cos265
    P06_HollywoodCenter.iml
    readme.html                                 <<<<<

We will supply an IntelliJ project with stdlib.jar and algs4.jar libraries. These libraries contain many useful classes.

On this assignment, the only library functions you may call are those in java.util, stdlib.jar, and algs4.jar.

Grading Rubric


Submission
3pts Submission contains all files
Implementation
3pts Correct implementation behind HollywoodGraph API
3pts Correct implementation behind HollywoodActor API
Efficiency
3pts Efficient code
Readme
3pts Completed readme.html
Extra Credit
+3pts Extra credit: correct implementation for moviePath()

More Reading


See also The Oracle of Bacon.

API and source code for classes in algs4.jar:

Graph api source
SymbolGraph api source
BreadthFirstPaths api source
CC (use CCBFS instead) api source