Data Structures and Algorithms

Graphs and Graph Algorithms

A graph is just a set of vertices and edges. Edges can be thought of as tuples of vertices.

Parallel edges are allowed
Self-loops are allowed

Graphs without parallel edges and without self-loops are called simple graphs. In general, a graph is simple unless specified otherwise.

Complexity of graph algorithms is typically defined in terms of:

$|E|$ , the number of edges
$|V|$ , the number of vertices
both

Sparse and Dense Graphs

Sparse Graph

Few edges compared to vertices. Represented as an adjacency list.

$|E| << |V^2|$

Dense Graph

Many edges, represented as an adjacency matrix.

$|E| \approx |V^2|$

Graph Representation

Adjacency Matrix

An adjacency matrix is an $|V| \times |V|$ matrix, where $M_{ij} = 1$ iff there is an edge connecting node $i$ and node $j$ .

Distance Matrix

Same as adjacency matrix, except instead of having 1, you have the distance between $i$ and $j$ . In a graph with weighted edges, this is well-defined.

Adjacency List.

In an adjacency list, you keep $|V|$ linked lists, in which the first element is the source and the rest of the nodes in the list.

Graph Definitions

Graphs: Undirected vs Directed

Graphs can either be undirected or directed. In a directed graph, each edge has a direction, and nodes on edges form ordered pairs.

$e_n = (u, v)$ means there is an edge from * $u$ to $v$ .

Weighted Graphs

Edges can be weighted. In weighted graphs, each edge has a distance associated with it. It's kind of like a distance associated with an edge. Weights may be different for sets of parallel edges.

Often, there are algorithms for a graph to find the least cost path between two vertices.

Simple Path

A simple path is a sequence of edges that lead from one vertex to another with no vertex appearing twice.

Connected Graph

In a connected graph, a simple path exists between any pair of vertices.

Cycle

A simple path, except the first and final nodes are the same.

Data Structures (Implementation)

Adjacency Matrix

One way to implement this is with an adjacency matrix. This is a $|V| \times |V|$ matrix representing the graph. There are differences in directed vs. undirected graphs - in a directed graph, the adjmat has to/from relationships.

An undirected adjmat only needs $\approx \frac{v^2}{2}$ space. In unweighted graphs, 0 means no edge, and 1 means edge. In a weighted graph, $\infty$ means no edge, and value = edge otherwise.

Adjacency List

If you use an adjacency list, you assume that the edges are distributed across the vertices.

Each vertex has approximately $O(\frac{|E|}{|V|})$ edges in the adjacency list representation. It costs 1 to access a vertex list, and the average cost for the individual vertex is $O(1 + \frac{|E|}{|V|})$ to get list and traverse it.

The cost for all vertices is $O(|V|) * O(1 + \frac{|E|}{|V|}) = O(V + E)$ time.

Directed vs. undirected

A directed adjlist contains each edge once in the edge set. On the other hand, in an undirected adjlist, each edge is represented twice.

Unweighted vs. weighted

Unweighted: NULL = no edge, list item = edge. For weighted, NULL = no edge, list item with val = edge.

Examples!

Let's say you want to find all vertices that are connected to some vertex. What is the algorithm to determine this with a adjmat and adjlist?

Adjmat: Go to row of the source vertex. Find all nonzero edges in that row. Return those nodes. Complexity: $O(V)$ .
Adjlist: Hash to the linked list, and return all of the edges in that list. Complexity: $O(1 + \frac{|E|}{|V|}) = O(\frac{|E|}{|V|}$ .

Another example. Let's say you want to see if there are any edges for a vertex?

Adjmat: Go through everything. Stop at the end if you never find any, or if you find one you can stop immediately.
- Worst case $O(|V|)$
- Best case $O(1)$
- Average case $O(\frac{|V|}{2}) = O(|V|)$
Adjlist: See if a first entry exists! Always $O(1)$ . BOOM!

What about if you want to determine the greatest edge from a vertex?

Adjmat: Go through everything. Stop at the end if you never find any, or if you find one you can stop immediately. Always $O(|V|)$ .
Adjlist: Go through all entries in that list. Always $O(1 + \frac{|E|}{|V|})$ .

Let's say you have two values for each edge, cost and distance. Describe an algorithm to determine the greatest distance for the least cost if youi wanted to get from one node to another.

Adjmat: Still going to be $O(|V|)$ .
Adjlist: Go through all entries in that list. Always $O(1 + \frac{|E|}{|V|})$ .

What if you wanted to find the highest distance to cost ratio, including connecting flights inbetween? This is called the single source shortest path problem. This is a greedy algorithm and we'll go over this later.

Depth-first search

Search can really be categorized as depth-first search or breadth-first search. Depth-first search utilizes a stack, and the algorithm works on graphs and digraphs. It searches the deepest paths first, and discovers if a path from source s to goals g, if one exists.

Algorithm GraphDFS
    Mark source as visited
    Push source to Stack
    While Stack is not empty
        Pop candidate from Stack
        If candidate is goal
            Return success
        Else
            For child of candidate
                If child is unvisited
                    Mark child visited
                    Push child to Stack
    Return failure

Complexity with Adjacency List

How long does this take? This has to be called for each vertex at most once - $O(|V|)$ . The adjlist for each vertex is visited at most once and the set of edges is distributed over the set of vertices – $O(1 + \frac{|E|}{|V|})$ .

If the graph is dense, and you made a mistake by representing it as an adjacency list, it would be $O(V + E)$ .

Complexity of Adjacency Matrix

This is called for each vertex at most once - $O(|V|)$ . Adjmat row for each vertex is visited at most once – $O(|V|)$ .

If the graph is sparse, and you wanted an adjacency list, it's $O(|V|^2)$ .

Breadth-first search

You systematically explore the edges of $G$ to discover a shortest path from the source $s$ to the goal $g$ using a queue. This works on graphs and digraphs, and if all costs are equal (unweighted graph), then there BFS finds a shortest path from the source to the goal.

Algorithm GraphBFS
    Mark source as visited
    Push source to back of Queue
    While Queue is not empty
        Pop candidate from front of Queue
        If candidate is goal
            Return success
        Else
            For child of candidate
                If child is unvisited
                    Mark child visited
                    Push child to back of Queue
    Return failure

Complexity of Adjacency List

Same as depth-first search. You will visit each vertex at most once, and for a given vertex, you will visit every node in its adjacency list at most once.

Complexity of Adjacency Matrix

Again, this is the same complexity.

However, in the real world, a queue-based approach will find the solution sooner.

Search Examples

Let's say you want to find if there is a connection between two nodes, either going through connecting nodes or not. This is really just finding a path between the source and destination.

If you're trying to find the shortest path, you're still having to do single source shortest path.

Let's say you are planning a family reunion. Your family is spread out all over the US. You don't know where to have the reunion, but you want to minimize the total travel cost. This is an NP-complete problem.

Let's say you want to build a high-speed rail across the US. Describe an algorithm to determine the least cost of construction, such that any city can be reached from any other city. This is known as finding a minimal spanning tree, and the complexity is $O(n^2)$ .