DSU, also known as Disjoint-set/Union-find (sometimes called disjoint sets), is a practical algorithm for graph connectivity problems and for computing the MST.
This algorithm has two main operations: union and find. In this algorithm, each set has a representative.
Before beginning, I should mention there are two well-known methods for implementing DSU: the list method and the forest method.
In the list method, each member of a set is placed in a list, and an array called Rep
is maintained to indicate the representative of member \(x\). For example, \(Rep[x] = X\).
The second method uses a forest (collection of trees). Here, each member is considered a node. Each set forms a tree where all nodes belong to that set. The tree is "hung" from a designated node (the representative of the set). Every node except the representative has a parent, and we maintain an array Par
to store the parent of each node. For the root node (representative), its parent can be set to itself to identify it as the tree’s representative. For example, \(Par[x] = X\).
In this section, we will explore fundamental commands for working with graphs.
def add_vertex(self, label):
# درج یک رأس جدید
if label not in self.vertices:
self.vertices.append(label)
self.edges[label] = []
The above code adds a new vertex to the graph. Each vertex must have a unique label.
def add_edge(self, from_label, to_label):
# افزودن یال بین دو رأس
if from_label in self.vertices and to_label in self.vertices:
self.edges[from_label].append(to_label)
self.edges[to_label].append(from_label)
This method creates an edge between two vertices. Note that this implementation assumes an undirected graph.
Important Considerations: - Use unique labels for vertices. - Check vertex existence before adding edges. - The edges are stored as an adjacency list.
The find operation is used to determine the representative of an element. It is employed when you need to identify the representative of the set containing the element \(x\).
The first method, used for lists, works by simply returning the value of the array \(Rep[x]\).
int Find(int x) {
if (Rep[x] == x)
return x;
return Find(Rep[x]);
}
int Find(int x){
return Rep[x];
}
1int find(int x) {
2 while (par[x] != x) { // This line moves from x to the root
3 x = par[x];
4 }
5 return x;
6}
int Find(int x){
if (Par[x] == x)
return x;
return Find(Par[x]);
}
Merges two sets and forms a new set. Suppose we merge two sets containing two members \(x\) and \(y\).
First, we find the representatives of these two members. Let's assume they are \(X\) and \(Y\), respectively. If \(X\) and \(Y\) are the same, it means the two members are in the same set, and we do not need to perform a merge. If they are not equal, we set the representative of the members of one set to be the representative of the members of the other set. The key point is that we change the representative of the set with fewer members. This is because the order of this operation is \(O(n \lg(n))\) (the representative of each member changes at most \(\lg(n)\) times, since in each step, the number of members in the group whose representative changes doubles). This technique of merging the smaller set into the larger one is called Union by Rank.
void Union(int x, int y){
x = Find(x);
y = Find(y);
if (x == y)
return;
if (sz[x] < sz[y])
swap(x, y);
sz[x] += sz[y];
for (int z : lst[y]){
Rep[z] = x;
lst[x].push_back(z);
}
lst[y].clear();
}
Another method available for merging is when we consider the sets as a forest. In this case, similar to the previous section, we can compare the number of vertices in each component (set) and assign the representative of the component with fewer vertices to the representative of the other component. In this way, to find the root of the component of an arbitrary vertex, the Find function is called at most \(lg(n)\) times (in other words, the height of each tree in the forest is at most \(lg(n)\)).
void Union(int x, int y){
x = Find(x);
y = Find(y);
if (x == y)
return;
if (sz[x] < sz[y])
swap(x, y);
sz[x] += sz[y];
Par[y] = x;
}
Now, if we use the Path Compression technique in the Find function to find the root, we can improve our amortized complexity. In this method, while searching for the root of \(x\), at the end we set the parent of \(x\) to be the root. This technique, called Path Compression, causes all vertices along the path from \(x\) to the root to update their parent to the root. As a result, the number of children of the root increases. This shortens the path from \(x\) to the root (refer to the Find function for a better understanding) and reduces the amortized complexity of each operation to \(O(lg^*n)\). This means for \(n = 10^6\), only five operations are needed (\(lg^*n\) represents the number of times we need to take the logarithm of \(n\) until we reach 1. For example, \(lg^*4 = 2\) because taking the logarithm of 4 once gives 2, and taking it again gives 1, resulting in two total operations). In general, \(lg^*n\) is at most 5 for \(n\) smaller than \(2^{65536}\), demonstrating the efficiency of Path Compression.
An important note is that even if Path Compression is used without Union by Rank, the amortized complexity of each operation will be \(O(lg(lg(n)))\), which in practice is nearly indistinguishable from using Union by Rank!
int Find(int x){
if(Par[x] != x)
Par[x] = Find(Par[x]);
return Par[x];
}
Undoes the last merge operation and separates the two merged sets. Assume we do not use Path Compression. In this case, each call to the merge function only changes the values of \(sz_x\) and \(Par[y]\). Therefore, we can save the changes we make so that when an undo is needed, we can revert to the previous values of these two variables. This allows us to implement each undo operation in \(O(1)\).
Note that having an undo function makes it impossible to use Path Compression, as our time complexity will no longer remain efficient (recall that Path Compression's complexity is amortized efficiently, and a single call to the Find function alone could even take \(O(n)\) time).
int Find(int x){
if(Par[x] == x)
return x;
return Find(Par[x]);
}
void Union(int x, int y){
x = Find(x);
y = Find(y);
if (x == y)
return;
if (sz[x] < sz[y])
swap(x, y);
operations.push_back(make_pair(y, sz[y]));
sz[x] += sz[y];
Par[y] = x;
}
void Undo(){
int y = operations.back().first;
sz[y] = operations.back().second;
operations.pop_back();
int x = Find(y);
sz[x] -= sz[y];
Par[y] = y;
}
You may wonder, which approach should we take to implement the algorithm now? Represent the set as a list or as a forest?
In some problems, we may need to maintain the set (component) of each vertex or have the ability to undo previous merge operations. In such cases, the list method is required.
In other situations, it's better to use the forest method. Because when using Path Compression, the order of the two operations Find and Union drops significantly, reaching \(O(lg^*n)\).
1class UnionFind:
2 def __init__(self, size):
3 self.parent = list(range(size)) # creating parent list
4
5 def find(self, x):
6 while self.parent[x] != x:
7 self.parent[x] = self.parent[self.parent[x]] # Path Compression
8 x = self.parent[x]
9 return x
10
11 def union(self, x, y):
12 x_root = self.find(x)
13 y_root = self.find(y)
14 if x_root != y_root:
15 self.parent[y_root] = x_root # union with one parent