A graph is just a set of vertices and edges. Edges can be thought of as tuples of vertices.
Graphs without parallel edges and without self-loops are called simple graphs. In general, a graph is simple unless specified otherwise.
Complexity of graph algorithms is typically defined in terms of:
Few edges compared to vertices. Represented as an adjacency list.
Many edges, represented as an adjacency matrix.
An adjacency matrix is an matrix, where iff there is an edge connecting node and node .
Same as adjacency matrix, except instead of having 1, you have the distance between and . In a graph with weighted edges, this is well-defined.
In an adjacency list, you keep linked lists, in which the first element is the source and the rest of the nodes in the list.
Graphs can either be undirected or directed. In a directed graph, each edge has a direction, and nodes on edges form ordered pairs.
Edges can be weighted. In weighted graphs, each edge has a distance associated with it. It's kind of like a distance associated with an edge. Weights may be different for sets of parallel edges.
Often, there are algorithms for a graph to find the least cost path between two vertices.
A simple path is a sequence of edges that lead from one vertex to another with no vertex appearing twice.
In a connected graph, a simple path exists between any pair of vertices.
A simple path, except the first and final nodes are the same.
One way to implement this is with an adjacency matrix. This is a matrix representing the graph. There are differences in directed vs. undirected graphs - in a directed graph, the adjmat has to/from relationships.
An undirected adjmat only needs space. In unweighted graphs, 0 means no edge, and 1 means edge. In a weighted graph, means no edge, and value = edge otherwise.
If you use an adjacency list, you assume that the edges are distributed across the vertices.
Each vertex has approximately edges in the adjacency list representation. It costs 1 to access a vertex list, and the average cost for the individual vertex is to get list and traverse it.
The cost for all vertices is time.
A directed adjlist contains each edge once in the edge set. On the other hand, in an undirected adjlist, each edge is represented twice.
Unweighted: NULL = no edge, list item = edge. For weighted, NULL = no edge, list item with val = edge.
Let's say you want to find all vertices that are connected to some vertex. What is the algorithm to determine this with a adjmat and adjlist?
Another example. Let's say you want to see if there are any edges for a vertex?
What about if you want to determine the greatest edge from a vertex?
Let's say you have two values for each edge, cost and distance. Describe an algorithm to determine the greatest distance for the least cost if youi wanted to get from one node to another.
What if you wanted to find the highest distance to cost ratio, including connecting flights inbetween? This is called the single source shortest path problem. This is a greedy algorithm and we'll go over this later.
Search can really be categorized as depth-first search or breadth-first search. Depth-first search utilizes a stack, and the algorithm works on graphs and digraphs. It searches the deepest paths first, and discovers if a path from source s to goals g, if one exists.
Algorithm GraphDFS
Mark source as visited
Push source to Stack
While Stack is not empty
Pop candidate from Stack
If candidate is goal
Return success
Else
For child of candidate
If child is unvisited
Mark child visited
Push child to Stack
Return failure
How long does this take? This has to be called for each vertex at most once - . The adjlist for each vertex is visited at most once and the set of edges is distributed over the set of vertices – .
If the graph is dense, and you made a mistake by representing it as an adjacency list, it would be .
This is called for each vertex at most once - . Adjmat row for each vertex is visited at most once – .
If the graph is sparse, and you wanted an adjacency list, it's .
You systematically explore the edges of to discover a shortest path from the source to the goal using a queue. This works on graphs and digraphs, and if all costs are equal (unweighted graph), then there BFS finds a shortest path from the source to the goal.
Algorithm GraphBFS
Mark source as visited
Push source to back of Queue
While Queue is not empty
Pop candidate from front of Queue
If candidate is goal
Return success
Else
For child of candidate
If child is unvisited
Mark child visited
Push child to back of Queue
Return failure
Same as depth-first search. You will visit each vertex at most once, and for a given vertex, you will visit every node in its adjacency list at most once.
Again, this is the same complexity.
However, in the real world, a queue-based approach will find the solution sooner.
Let's say you want to find if there is a connection between two nodes, either going through connecting nodes or not. This is really just finding a path between the source and destination.
If you're trying to find the shortest path, you're still having to do single source shortest path.
Let's say you are planning a family reunion. Your family is spread out all over the US. You don't know where to have the reunion, but you want to minimize the total travel cost. This is an NP-complete problem.
Let's say you want to build a high-speed rail across the US. Describe an algorithm to determine the least cost of construction, such that any city can be reached from any other city. This is known as finding a minimal spanning tree, and the complexity is .