Directed Graphs

COS 265 - Data Structures & Algorithms

Directed Graphs

introduction

directed graphs

Digraph: Set of vertices connected pairwise by directed edges.

directed path from 8 to 1: red lines
directed cycle: blue lines
vertex 6 (green) has outdegree=4 and indegree=2

road network

vertex = intersection; edge = one-way street

political blogosphere graph

[ The Political Blogosphere and the 2004 U.S. Election: Divided They Blog, Adamic and Glance, 2005 ]

Overnight interbank loan graph

[ The Topology of the Federal Funds Market, Bech and Atalay, 2008 ]

uber taxi graph

[ http://blog.uber.com/2012/01/09/uberdata-san-franciscomics/ ]

implication graph

vertex = variable; edge = logical implication

combinational circuit

wordnet graph

vertex = synset; edge = hypernym relationship

[ WordNet ]

digraph applications

digraph	vertex	directed edge
transportation	street intersection	one-way street
web	web page	hyperlink
food web	species	predator-prey relationship
WordNet	synset	hypernym
scheduling	task	precedence constraint
financial	bank	transaction
cell phone	person	placed call
infectious disease	person	infection
game	board position	legal move
citation	journal article	citation
object graph	object	pointer
inheritance hierarchy	class	inherits from
control flow	code block	jump

some digraph problems

problem	description
s→t path	Is there a path from \(s\) to \(t\)?
shortest s→t path	What is the shortest path from \(s\) to \(t\)?
directed cycle	Is there a directed cycle in the graph?
topological sort	Can the digraph be drawn so that all edges point upwards?
strong connectivity	Is there a directed path between all pairs of vertices?
transitive closure	For which vertices \(v\) and \(w\) is there a directed path from \(v\) to \(w\)?
PageRank	What is the importance of a web page?

directed graphs

digraph API

digraph api

Almost identical to Graph API

digraph api

// read digraph from input stream
In in = new In(args[0]);
Digraph G = new Digraph(in);

// print out each edge once
for(int v = 0; v < G.V(); v++)
    for(int w : G.adj(v))
        StdOut.println(v + "->" + w);

$ cat tinyDG.txt
13
22
4 2
2 3
 ⋮

$ java Digraph tinyDG.txt
0->5
0->1
2->0
2->3
3->5
3->2
 ⋮

1st line: vertex count (\(V\))
2nd line: edge count (\(E\))

digraph representation: adjacency lists

Maintain vertex-indexed array of lists

digraph representations

In practice: use adjacency-lists representations

Algorithms based on iterating over vertices pointing from \(v\)
Real-world digraphs tend to be sparse (huge number of vertices, small average vertex degree)

representation	space	insert edge from \(v\) to \(w\)	edge from \(v\) to \(w\)?	iterate over vertices pointing from \(v\)
list of edges	\(E\)	\(1\)	\(E\)	\(E\)
adjacency matrix	\(V^2\)	\(1^\dagger\)	\(1\)	\(V\)
adjacency lists	\(E + V\)	\(1\)	\(\mathrm{outdegree}(v)\)	\(\mathrm{outdegree}(v)\)

\(^\dagger\)disallows parallel edges

adj-lists graph repr (review): java implementation

public class Graph {
    private final int V;
    private final Bag<Integer>[] adj;     // adj lists

    public Graph(int V) {
        // create empty graph with V vertices
        this.V = V;
        adj = (Bag<Integer>[]) new Bag[V];
        for(int v = 0; v < V; v++)
            adj[v] = new Bag<Integer>();
    }

    public void addEdge(int v, int w) {
        // add edge v-w (parallel edges and self-loops allowed)
        adj[v].add(w);
        adj[w].add(v);
    }

    // iterator for vertices adjacent to v
    public Iterable<Integer> adj(int v) {
        return adj[v];
    }
}

adj-lists digraph repr: java implementation

public class Digraph {
    private final int V;
    private final Bag<Integer>[] adj;     // adj lists

    public Digraph(int V) {
        // create empty graph with V vertices
        this.V = V;
        adj = (Bag<Integer>[]) new Bag[V];
        for(int v = 0; v < V; v++)
            adj[v] = new Bag<Integer>();
    }

    public void addEdge(int v, int w) {
        // add edge v->w
        adj[v].add(w);

    }

    // iterator for vertices adjacent to v
    public Iterable<Integer> adj(int v) {
        return adj[v];
    }
}

directed graphs

digraph search

reachability

Problem: Find all vertices reachable from \(s\) along a directed path

depth-first search in digraphs

Same method as for undirected graphs

Every undirected graph is a digraph with edges in both directions
DFS is a digraph algorithm

DFS (to visit a vertex v)
    Mark v as visited
    Recursively visit all unmarked verts w pointing from v

depth-first search demo

To visit a vertex \(v\):

Mark vertex \(v\) as visited
Recursively visit all unmarked vertices pointing from \(v\)






0

stack





1
0

stack





5
0

stack




4
5
0

stack



3
4
5
0

stack


2
3
4
5
0

stack


2
3
4
5
0

stack


2
3
4
5
0

stack



3
4
5
0

stack




4
5
0

stack





5
0

stack






0

stack








stack

depth-first search (in undirected graphs)

Recall code for undirected graphs.

public class DepthFirstPaths {
    private int s;
    private boolean[] marked;  // true if v connected to s
    private int[] edgeTo;      // prev vertex on path from s to v

    public DepthFirstPaths(Graph G, int s) {
        //  initialize data structures
        /* ...  */

        // find vertices connected to s
        dfs(G, s);
    }

    // recursive DFS does the work
    private void dfs(Graph G, int v) {
        marked[v] = true;
        for(int w : G.adj(v)) {
            if(!marked[w]) {
                edgeTo[w] = v;
                dfs(G, w);
            }
        }
    }
}

depth-first search (in directed graphs)

Directed graphs identical to undirected (sub Digraph for Graph)

public class DirectedDFP {
    private int s;
    private boolean[] marked;  // true if v connected to s
    private int[] edgeTo;      // prev vertex on path from s to v

    public DirectedDFP(Diraph G, int s) {
        //  initialize data structures
        /* ...  */

        // find vertices connected to s
        dfs(G, s);
    }

    // recursive DFS does the work
    private void dfs(Diraph G, int v) {
        marked[v] = true;
        for(int w : G.adj(v)) {
            if(!marked[w]) {
                edgeTo[w] = v;
                dfs(G, w);
            }
        }
    }
}

reachability application: program control-flow analysis

Every program is a digraph

Vertex = basic block of instruction (straight-line program)
Edge = jump

Dead-code elimination

Find (and remove) unreachable code

Infinite-loop detection

Determine whether exit is unreachable

reachability application: mark-sweep garbage collector

Every data structure is a digraph

Vertex = object
Edge = reference

Roots

Objects known to be directly accessible by program (ex: stack)

Reachable objects

Objects indirectly accessible by program (starting at a root and following a chain of pointers)

reachability application: mark-sweep garbage collector

Mark-sweep algorithm [McCarthy, 1960]

Mark: mark all reachable objects
Sweep: if object is unmarked, it is garbage (so add to free list)

Memory Cost: Uses 1 extra bit per object (plus DFS stack.)

depth-first search in digraphs summary

DFS enables direct solution of simple digraph problems

Reachability
Path finding
Topological sort
Directed cycle detection

Basis for solving difficult digraph problems

2-satisfiability
Directed Euler path
Strongly-connected components

Breadth-first search in digraphs

Same method as for undirected graphs

Every undirected graph is a digraph (with edges in both directions)
BFS is a digraph algorithm

BFS (from source vertex s):
    Put s onto a FIFO queue, and mark s as visited
    Repeat until the queue is empty:
        remove the least recently added vertex v
        for each unmarked vertex pointing from v:
            add to queue and mark as visited

Proposition: BFS computes shortest paths (fewest number of edges) from \(s\) to all other vertices in a digraph in time proportional to \(E+V\)

multiple-source shortest paths

Multiple-source shortest paths: Given a digraph and a set of source vertices, find shortest path from any vertex in the set to each other vertex

Ex: \(S = \{ 1, 7, 10 \}\)

Shortest paths to...

\(4\): \(7 \rightarrow 6 \rightarrow 4\)
\(5\): \(7 \rightarrow 4 \rightarrow 0 \rightarrow 5 \)
\(12\): \(10 \rightarrow 12\)

Q. How to implement multi-source shortest paths algorithm?

Use BFS, but initialize by enqueueing all source vertices

breadth-first search in digraphs application: web crawler

Goal: Crawl web, starting from some root web page, say taylor.edu

Solution: (BFS with implicit digraph)

Choose root web page as source \(s\)
Maintain a queue of websites to explore
Maintain a set of discovered websites
Dequeue the next website and enqueue websites to which it links (provided you haven't done so before)

Q. Why not use DFS?

directed graphs

topological sort

precedence scheduling

Goal: Given a set of tasks to be completed with precedence constraints, in which order should we schedule the tasks?

Digraph model: vertex = task; edge = precedence constraint

Tasks: COS120, COS121, COS143, COS243, COS265, COS284, COS320, MAT151, MAT215

precedence scheduling

DAG: Directed Acyclic Graph
Topological Sort: Redraw DAG so all edges point upwards

Tasks: COS120, COS121, COS143, COS243, COS265, COS284, COS320, MAT151, MAT215

topological sort demo

Topological sort:

Run depth-first search
Return vertices in reverse postorder

topological sort demo

Topological sort:

Run depth-first search
Return vertices in reverse postorder


Postorder	4 1 2 5 0 6 3
Topological Order	3 6 0 5 2 1 4

depth-first search order

public class DepthFirstOrder {
    private boolean[] marked;
    private Stack<Integer> reversePostorder;

    public DepthFirstOrder(Digraph G) {
        reversePostorder = new Stack<Integer>();
        marked = new boolean[G.V()];
        for(int v = 0; v < G.V(); v++)
            if(!marked[v]) dfs(G, v);
    }

    private void dfs(Digraph G, int v) {
        marked[v] = true;
        for(int w : G.adv(v))
            if(!marked[w]) dfs(G, w);
        reversePostorder.push(v);
    }

    // returns all vertices in "reverse DFS postorder"
    public Iterable<Integer> reversePostorder() {
        return reversePostorder;
    }
}

topological sort in a DAG: correctness proof

Proposition: Reverse DFS postorder of a DAG is a topological order.

Pf: Consider any edge \(v \rightarrow w\). When dfs(v) is called:

Case 1: dfs(w) has already been called and returned. Thus, \(w\) was done before \(v\).
Case 2: dfs(w) has not yet been called. dfs(w) will get called directly or indirectly by dfs(v) and will finish before dfs(v). Thus, \(w\) will be done before \(v\).
Case 3: dfs(w) has already been called, but has not yet returned. Can't happen in a DAG: function call stack contains path from \(w\) to \(v\), so \(v \rightarrow w\) would complete a cycle.

topological sort in a DAG: correctness proof

Proposition: Reverse DFS postorder of a DAG is a topological order.

dfs(0)
    dfs(1)
        dfs(4)
            // 4 done
        // 1 done
    dfs(2)     // 1 -> 2
        // 2 done
    dfs(5)
        // check 2
        // 5 done
    // 0 done
// check 1
// check 2
dfs(3)
    // check 2       (case 1)
    // check 4       (case 1)
    // check 5       (case 1)
    dfs(6)  //       (case 2)
        // check 0
        // check 4
        // 6 done
    // 3 done        <----
// check 4
// check 5
// check 6
// done

When 3 done, all vertices pointing from \(3\) are done before \(3\) is done, so they appear after \(3\) in topological order

directed cycle detected

Proposition: A digraph has a topological order iff no directed cycle.

Pf:

If directed cycle, topological order impossible
If no directed cycle, DFS-based algorithm finds a topological order

Goal: Given a digraph, find a directed cycle.

Solution: DFS. What else? see textbook

directed cycle detection application: precedence scheduling

Scheduling: Given a set of tasks to be completed with precedence constraints, in what order should we schedule the tasks?

Remark: A directed cycle implies scheduling problem is infeasible.

[ XKCD ]

directed cycle detection application: cyclic inheritance

The Java compiler does cycle detection.

public class A extends B { }

public class B extends C { }

public class C extends A { }

$ javac A.java
A.java:1: cyclic inheritence involving A
public class A extends B { }
       ^
1 error

directed cycle detection application: spreadsheet recalculation

Microsoft Excel does cycle detection (and has a circular reference toolbar!)

depth-first search orders

Observation: DFS visits each vertex exactly once. The order in which it does so can be important.

Orderings:

Preorder: order in which dfs() is called
Postorder: order in which dfs() returns
Reverse postorder: reverse order in which dfs() returns

private void dfs(Digraph G, int v) {
    marked[v] = true;
    preorder.enqueue(v);          // preorder (queue)
    for(int w : G.adv(v))
        if(!marked[w]) dfs(G, w);
    postorder.enqueue(v);         // postorder (queue)
    reversePostorder.push(v);     // reverse postorder (stack)
}

directed graphs

strong components

strongly-connected components

Def: Vertices \(v\) and \(w\) are strongly connected if there is both a directed path from \(v\) to \(w\) and a directed path from \(w\) to \(v\).

Key property: Strong connectivity is an equivalence relation:

\(v\) is strongly connected to \(v\)
If \(v\) is strongly connected to \(w\), then \(w\) is strongly connected to \(v\)
If \(v\) is strongly connected to \(w\) and \(w\) to \(x\), then \(v\) is strongly connected to \(x\)

strongly-connected components

Def: A strong component is a maximal subset of strongly-connected vertices

connected components vs. strongly-connected components

\(v\) and \(w\) are connected if there is a path between \(v\) and \(w\). use connected component id, which is easy to compute with DFS.

//        0 1 2 3 4 5 6 7 8 9 10 11 12
// id[] = 0 0 0 0 0 0 1 1 1 2  2  2  2
public boolean connected(int v, int w) {
    return id[v] == id[w]; // const-time
}

\(v\) and \(w\) are strongly connected if there is both a directed path from \(v\) to \(w\) and a directed path from \(w\) to \(v\). strongly-connected component id (how to compute?)

//        0 1 2 3 4 5 6 7 8 9 10 11 12
// id[] = 1 0 1 1 1 1 3 4 3 2  2  2  2
public boolean stronglyConnected(int v, int w) {
    return id[v] == id[w]; // const-time
}

strong component application: ecological food webs

Food web graph

vertex = species; edge = from producer to consumer

Strong component: subset of species with common energy flow

[ http://www.twingroves.district96.k12.il.us/Wetlands/Salamander/SalGraphics/salfoodweb.gif ]

strong component application: software modules

Software module dependency graph:

vertex = software module
edge = from module to dependency

Strong component: Subset of mutually interacting modules

Approach 1: Package strong components together
Approach 2: Use to improve design!

strong components algorithms: brief history

1960s: Core OR problem

Widely studied; some practical algorithms
Complexity not understood

1972: linear-time DFS algorithm (Tarjan)

Classic algorithm
Level of difficulty: Algs4++
Demonstrated broad applicability and importance of DFS

1980s: easy two-pass linear-time algorithm (Kosaraju-Sharir)

Forgot notes for lecture; developed alg in order to teach it!
Later found in Russian scientific literature (1972)

1990s: more easy linear-time algorithms

Gabow: fixed old OR algorithm
Cheriyan-Mehlhorn: needed one-pass algorithm for LEDA

Kosaraju-Sharir algorithm: intuition

Reverse graph: Strong components in \(G\) are same as in \(G^R\)

Kernel DAG: Contract each strong component into a single vertex

Idea:

Compute topological order (reverse postorder) in kernel DAG
Run DFS, considering vertices in reverse topological order

kosaraju-sharir algorithm demo

Phase 1: Compute reverse postorder in \(G^R\)
Phase 2: Run DFS in \(G\), visiting unmarked vertices in reverse postorder of \(G^R\)

kosaraju-sharir algorithm demo

Phase 1: Compute reverse postorder in \(G^R\)

0->6->8, 6->7, 0->2->3->4->5, 4->11->9->12->10, 1
postorder: 8 7 6 5 10 12 9 11 4 3 2 0 1
reverse postorder: 1 0 2 3 4 11 9 12 10 5 6 7 8

kosaraju-sharir algorithm demo

Phase 2: Run DFS in \(G\), visiting unmarked vertices in reverse postorder of \(G^R\)

1 0 2 3 4 11 9 12 10 5 6 7 8
1 0 - - - 11 - -- -- - 6 7 -

`v`	`id[]`
0	1
1	0
2	1
3	1
4	1
5	1
6	3
7	4
8	3
9	2
10	2
11	2
12	2

kosaraju-sharir algorithm

Proposition: Kosaraju-Sharir algorithm computes the strong components of a digraph in time proportional to \(E+V\)

Pf:

Running time: bottleneck is running 2×DFS and computing \(G^R\)
Correctness: tricky, see textbook
Implementation: easy!

connected components in an undirected graph (with DFS)

public class CC {
    private boolean marked[];
    private int[] id;
    private int count;

    public CC(Graph G) {
        marked = new boolean[G.V()];
        id = new int[G.V()];

        for(int v = 0; v < G.V(); v++) {
            if(!marked[v]) {
                dfs(G, v);
                count++;
            }
        }
    }

    private void dfs(Graph G, int v) {
        marked[v] = true;
        id[v] = count;
        for(int w : G.adj(v)) {
            if(!marked[w]) dfs(G, w);
        }
    }

    public boolean connected(int v, int w) {
        return id[v] == id[w];
    }
}

strong components in a digraph (with two DFSs)

public class KosarajuSharirSCC {
    private boolean marked[];
    private int[] id;
    private int count;

    public KosarajuSharirSCC(Digraph G) {
        marked = new boolean[G.V()];
        id = new int[G.V()];
        DepthFirstOrder dfs = new DepthFirstOrder(G.reverse());
        for(int v : dfs.reversePostorder()) {
            if(!marked[v]) {
                dfs(G, v);
                count++;
            }
        }
    }

    private void dfs(Digraph G, int v) {
        marked[v] = true;
        id[v] = count;
        for(int w : G.adj(v)) {
            if(!marked[w]) dfs(G, w);
        }
    }

    public boolean stronglyConnected(int v, int w) {
        return id[v] == id[w];
    }
}

digraph-processing summary


single-source reachability in a digraph		DFS
topological sort in a DAG		DFS
strong components in a digraph		Kosaraju-Sharir DFS (twice)