Algorithm 15 – Minimum Spanning Tree

Algorithm 15 – Minimum Spanning Trees

Here introduce 2 types of algorithm based on different strategies to generate the MST given a graph. The best-known MST algorithm — Prim’s algorithm and Kruskal’s algorithm.

15.1 Problem Definition

The MST problem is about connecting a bunch of nodes as cheaply as possible. And this chapter assumes that the input graph is represented using adjacency lists.

15.1.2 Spanning Trees

A tree shouldn’t contain a cycle.
for every pair $v, w \in V$ of vertices, $T$ should include a path between $v$ and $w$.

Thus we can define the MST problem as:

Input: A connected undirected graph $G =(V, E)$ and a real-valued cost $c_e$ for each edge $e \in E$.
Output: A spanning tree $T \subset E$ of $G$ with the minimum-possible sum $\sum_{e\in T}c_e$ of edge costs.

15.2 Prim’s Algorithm

Input: connected undirected graph G = (V, E) in adjacency-list representation and a cost c_e for each edge e in E.
Output: the edges of a minimum spanning tree of G.
  
# Initialization
X = [s] # s is an arbitrarily chosen vertex.
T = [] # invariant: the edges in T span X
# Main Loop
while there is an edge (v, w) with v in X and w not in X:
  (v*, w*) = a minimum-cost such edge
  X.append(w*)
  T.append((v*, w*))
 	return T

Tips: it can be implemented with an extra list containing frontier edges, and the worst time complexity of calculating minimum cost of frontier edges is $O(m)$. And the while actually halts when all edges have been explored, which means $O(n)$ (Percisely it has $n-1$ iterations). Thus, it can be concluded as $O(mn)$.

Proposition 15.2 (Prim Running Time (Straightforward))

For every graph $G = (V, E)$ and real-valued edge costs, the straightforward implementation of Prim runs in $O(mn)$ time, where $m = |E|$ and $n = |V|$.

15.3 Speeding Up Prim’s Algorithm via Heaps.

Here we introduce an algorithm with near-linear running time.

Theorem 15.3 (Running Time of Three Heap Operations)

In a heap with $n$ objects, the INSERT, EXTRACTMIN, DELETE operations run in $O(\log n)$ time. More importantly, the hidden constant behind the big-O notation is competitively small.

Theorem 15.4 (Prim Running Time (Heap-based))

For every graph $G = (V, E)$ and real-valued edge costs, the heap-based implementation of prim runs in $O((m+ n)\log n)$ time.

class GroupVertice(Graph.Vertice):

    def __init__(self, key=None, winner=None):
        self.key = key
        self.winner = winner


class Prim:

    def __init__(self, graph):
        self._graph = graph

        self._min_span_tree = []
        self._explored_vertex = []

        # arbitrarily choose the initial vertice
        start = random.choice(graph.vertex)

        # update the unexplored vertex
        self._unexplored_vertex = [
            vertice for vertice in graph.vertex if vertice != start]

        self._explored_vertex.append(start)

        # initilization the keys and winners in terms of vertex
        vertex = self._graph.vertex
        for vertice in vertex:
            edge = graph.get_edge_by_vertex(start, vertice)
            if vertice != start and edge:
                vertice.key = edge.weight
                vertice.winner = edge
            else:
                vertice.key = math.inf
                vertice.winner = None

        self._heap = Heap.MinHeap(self._unexplored_vertex, lambda x: x.key)

        assert self._heap.heap is not []

    def run(self):

        # Main Loop

        # if Heap is non-empty
        while self._heap.heap:

            # the default mode is extract max
            opt_vertice = self._heap.extract_root()

            # add w_opt into the explored node set
            self._explored_vertex.append(opt_vertice)

            # add the local winner edge leading to w_opt to tree
            self._min_span_tree.append(opt_vertice.winner)

            # update keys to maintain invariant
            # likewise in the dijkstra
            for index in range(len(self._unexplored_vertex)):

                # get target vertice from index
                unexplored_vertice = self._unexplored_vertex[index]

                # get edge (opt_vertice, unexplored_vertice)
                edge_probe = self._graph.get_edge_by_vertex(
                    opt_vertice, unexplored_vertice)

                if edge_probe is None:
                    continue

                # check if the edge satisfies the update condition
                if edge_probe.weight < unexplored_vertice.key:
                    self._heap.delete(index)
                    unexplored_vertice.key = edge_probe.weight
                    unexplored_vertice.winner = edge_probe
                    self._heap.insert(unexplored_vertice)

        return self._min_span_tree

the whole code could be found on Github.

15.5 Kruskal’s Algorithm

while Prim’s algorithm was constrained to choose the cheapest edge crossing the current frontier, Kruskal’s algorithm is free to choose the cheapest remaining edge in the entire graph.

15.5.2 Pseudocode

Input: an undirected graph G = (V, E)
Output: the edges of a minimum spanning tree of G

# preprocessing
T := []
sort edges of E by cost # e.g. using QuickSort
# Main Loop
for e in E:
	if T + {e} is acyclic:
		T = T + {e}
return T

15.5.3 Straightforward Implementation

In the preprocessing step, the sorting algorithm contributes $O(m\log n)$ work to the overall running time. (Why $O(m\log n)$ instead of $O(m\log m)$? Because there’s no difference between the two.)

In the main loop, we want to check if the graph has a cycle after adding a new edge into it. We can use breadth- or depth-first searching to check if there are any cycle, and these algorithms are $O(m+n)$, and we have $n$ iterations. Thus, the overall running time is $O((m+n)n)=O(mn)$(Because $n-1\le m\le n(n-1)/2$)

15.12 Kruskal Run Time (Straightforward)

For every graph $G = (V, E)$ and real-valued edge costs, the straightforward implementation of Kruskal runs in $O(mn)$ time, where $m=|E|$ and $n=|V|$.

15.6 Union-Find

15.13 Kruskal Run Time (Union-Find-Based)

For every graph $G = (V, E)$ and real-valued edge costs, the straightforward implementation of Kruskal runs in $O((m+n)\log n)$ time, where $m=|E|$ and $n=|V|$.

15.6.1 The Union Find Data Structure

The raison detre of the union-find data structure is to maintain a partition of a static set of objects. In the initial partition, each object is in its own set.

class Wrapper:

    def __init__(self, item, parent):
        self.item = item
        self.parent = parent
        # to track the size of the union tree
        self.size = 1

    def __str__(self):
        return "item: {0}, parent: {1}, size: {2}".format(self.item, self.parent, self.size)

class UnionFind:
    """
    This data structure is implemented with list.
    key: the position in the list.
    value: a user-defined object containing the parent position it directs to.
    """

    def __init__(self, array) -> None:
        """
        Args:
            array (List<Wrapper(item, parent)>): A list of object desired
        """

        self.array = array
        self.wrapper_dict = {item: Wrapper(item, item) for item in array}

    def find(self, target):
        return self._find_wrapper(target).item
    
    def _find_wrapper(self, target):
        target_wrapper = self.wrapper_dict[target]

        # recursive-based find
        if target_wrapper.item != target_wrapper.parent:
            return self._find_wrapper(target_wrapper.parent)

        # base case
        return target_wrapper

    def union(self, cand1, cand2):
        root1 = self._find_wrapper(cand1)
        root2 = self._find_wrapper(cand2)

        if root1 == root2:
            return

        if root1.size > root2.size:
            # merge the less deep one into the deep one
            root2.parent = root1.item
            root1.size = root1.size + root2.size
        elif root1.size <= root2.size:
            root1.parent = root2.item
            root2.size = root1.size + root2.size
        return

Theorem 15.14 (Running Time of Union-Find Operations)

Operation	Running time
INITIALIZE	$O(n)$
FIND	$O(\log n)$
UNION	$O(\log n)$

15.6.2 Pseudocode for Kruskal

Input: an undirected graph G = (V, E)
Output: the edges of a minimum spanning tree of G

# preprocessing
T := []
U := INITIALIZE(V)
sort edges of E by cost # e.g. using QuickSort

# Main Loop
for (v, w) in E:
	if FIND(U, v) ≠ FIND(U, w):
		 T := T + {(v, w)}
     UNION(U, v, w) 
return T

15.6.3 Running Time Analysis

preprocessing $O(n) + O(m\log n)$, $2m$ FIND operations $O(m\log n)$, $n-1$ UNION operations $O(n\log n)$, remaining bookkeeping $O(m)$.

Then total $O((m+n)\log n)$

Nobilow's Blog

Algorithm 15 – Minimum Spanning Tree