Disjoint Set Union | HackerEarth

Disjoint Set Union (DSU) or Union-Find is a graph algorithm that is very useful in situations when you have to determine the connected components in a graph.

Suppose that we have N nodes numbered from 1 to N and M edges. The graph can be disconnected and may have multiple connected components. Our job is to find out how many connected components are there in the graph and the number of nodes in each of them.

The basic idea behind DSU is the following:
Initially, all nodes are isolated i.e. there are no edges in the graph. We add edges to the graph one by one.
While adding an edge, check if the two nodes that the edge connects are already present in the same connected component.
- if they are, then do nothing.
- if they aren't, then make the smaller of the two connected components a part of the larger one.
(So, we are making union of disjoint connected components; therefore the name Disjoint Set Union)

To keep track of what connected component a node belongs to, we use an array named parent[ ].
parent[ i ] tells the ID of the connected component that the ith node belongs to. The ID of a connected component is one of the nodes in that connected component. This node is kind of the leader or parent of all other nodes in that connected component.
Initially, as all nodes are isolated, we have N connected components; each node being the leader of their connected component. Therefore, parent[ i ]=i for all 1<=i<=N.

To keep track of the size of a connected component, we use the size[ ] array.
size[ i ] = the number of nodes in the ith connected component.
Initially, size[ i ] =1, for all 1<=i<=N. This is because, initially, all connected components contain only one node.

When we encounter an edge that connects two nodes a and b that belong to different connected components, we first check which of the two connected components is bigger: the one that a belongs to or the one that b belongs to. The smaller connected component becomes part of the larger one. This is done to reduce the number of nodes whose parent has to be changed.

Notice that we used the function call findParent( i ) to find the parent of the ith node, instead of directly looking at parent[ i ]. The reason for this is:
The parent of a node is not changed as soon as its affiliation to a connected component is changed. We postpone this to when we actually need to find the parent of the node. Doing this avoids many useless operations. So, parent[ i ] may not contain the updated value of the connected component that i belongs to. That's why it's important that we use findParent( i ) instead of being lazy and taking the value directly from parent[ i ].

In the end, we need to consider those nodes i that have findParent( i )== i. This is because, these are the nodes that still belong to their initial connected component and were not assigned to a different one during execution. These represent the disjoint connected components we are looking for.

So the complete code for DSU is as follows:

#include<stdio.h>
#define MOD 1000000007
int findParent(int i,int parent[])
//function to find the connected component that the ith node belongs to
{
    if(parent[parent[i]]!=parent[i])
        parent[i]=findParent(parent[i],parent);
    return parent[i];
}
void unionNodes(int a,int b,int parent[],int size[])
//to merge the connected components of nodes a and b
{
    int parent_a=findParent(a,parent),parent_b=findParent(b,parent);
    if(parent_a==parent_b)
        return;
    if(size[parent_a]>=size[parent_b])//the larger connected component eats up the smaller one
    {
         size[parent_a]+=size[parent_b];
         parent[parent_b]=parent_a;
    }
    else
    {
         size[parent_b]+=size[parent_a];
         parent[parent_a]=parent_b;
    }
    return;
}

int main()
{

    int N,M,i,a,b;
    scanf(" %d %d",&amp;N,&amp;M);
    int parent[100001]={0},size[100001]={0};
    for(i=1;i<=N;i++)
    {
        parent[i]=i;
        size[i]=1;
    }

    for(i=1;i<=M;i++)
    {
        //scan each edge and merge the connected components of the two nodes
        scanf(" %d %d",&amp;a,&amp;b);
        unionNodes(a,b,parent,size);
    }

    for(i=1;i<=N;i++)
        printf("Node %d belongs to connected component %d\n",i,findParent(i,parent));
    long long ways=1;
    int nos=0;
    for(i=1;i<=N;i++)
    {
        if(findParent(i,parent)==i)//this condition is true only for disjoint connected components
        {
            printf("%d leader %d size\n",i,size[i]);
            nos++;
        }

    }
    printf("Total connected components : %d",nos);

    return 0;
}

Comparison between DFS and DSU:
The task that DSU achieves in this code can be done using DFS as well. You should to code the same using DFS too.

Though, keep in mind that DFS is not a replacement for DSU. DFS works well in cases when all edges are present in the graph from the beginning. But, in problems where edges are added during the execution and we need to run connectivity queries in between such additions, DSU is the better option. An example of this type of situation is the Kruskal's algorithm to find the Minimum Spanning Tree (MST).
In Kruskal's algorithm, before adding an edge to the MST we need to check if the addition of the edge creates a cycle or not. We can use DSU here. If the parents of the two nodes that the edge connects are same, then we know that addition of the edge will create a cycle.
Try implementing Kruskal's algorithm for MST using DSU by yourself. It's quite simple once you know DSU.