Union by Rank
Considering the union-find dat structure we’ve implemented in the Kruskal algorithm, it’s actually based on the size of a given set. However, here we’d like to introduce a rank-based solution and then a path-compression-based solution.
- Invariant (for now): $\text{rank}[x] = \max$ # for hops from a leaf to $x$( which is the root of a tree). Also note that $\max_x\text{rank}[x] \approx $ Worst-case running time of
FIND
. - Union by Rank: Make old root with smaller rank child of the root with the larger rank.
Corollary:
- For all objects $x$, $\text{rank}[x]$ only goes up over time.
- Only ranks of roots can go up [once $x$ not a root, then $\text{rank}[x]$ frozen forevermore].
- Ranks strictly increase along a path to the root.
Rank Lemma
Considering an arbitrary sequence of UNION
(+ FIND
) operations. For every $r \in Z^+$, there are at most $n/2^r$ objects with rank $r$.
Corollary: Max rank always $\le \log n$
Corollary: Worst-case running time of FIND
, UNION
is $O(\log n)$.
Path Compression
Idea: Why bother traversing a leaf-root path multiple times?
path compression: After FIND(x)
, install shortcuts (i.e. revise parent pointers) to $x$’s root all along the $x \rightarrow$ root path.
Con: Constant-factor overhead to FIND
.
Pro: Speeds up subsequent FINDs
.
Hopcroft-Ullman Theorem
Theorem: With union by rank and path compression, $m$ Union+FInd
operations take $O(m \log^n)$ time, where $\log ^n=$ the number of times you need to apply $\log$ to $n$ before the result is $\le 1$.
Tarjan’s Bound
Theorem: With Union by Rank and path compression, $m$ UNION + FIND operations take $O(m\alpha)$ time, where $\alpha(n)$ is the inverse Ackerman function.
The Ackerman Function
**Definition ** $A_k(r)$ for all integers $k$ and $r \ge 1$.
Base case: $A_0(r)=r+1$ for all $r\ge 1$
In general: For $k, r \ge 1$: $$ A_k(r) = (A_{k-1}\circ A_{k-1}\cdots \circ A_{k-1})(r) $$ here represents r-fold composition
- $A_1(r)=2r$
- $A_2(r)=r2^r$
- $A_3=$ a tower of $r$ $2$’s $=2^{2^{\cdots^2}}$
The Inverse Ackermann Function
Definition: For every $n\ge 4$, $\alpha(n)=$Minimum value of $k$ such that $A_k(2)\ge n$.
Comparison to $\log ^*$ function.