In this article we are going to have an introduction to the union-find problems, a set of algorithms used for solving so-called Dynamic-Connectivity problem.
First let’s go through the basic steps to follow while developing a useful algorithm.
The first step, is to model the problem. Try to understand, basically, what are the main elements of the problem that need to be solved.
Then we’ll find some algorithm to solve the problem. In many cases, the first algorithm we come up with would be fast enough and maybe it fits in memory and, we’ll go ahead and use it, and be off and running. But in many other cases maybe it’s not fast enough, or there’s not enough memory.
So, we try to figure out why, find a way to address that problem then find a new algorithm and iterate. This is the scientific approach to designing and analyzing algorithms, where we build mathematical models to try and understand what’s going on, and then we do experiments to validate those models and help us improve solutions.
So, here’s the idea. These problems are going to have a set of N objects, doesn’t really matter what they are. We’re going to use the numbers, zero through N to model our objects. so, we have the idea of a connection between two objects.
Further we postulate that there’s going to be a command that says, connect two objects. Given two objects, provide a connection between them. Then key part of the problem is find query or the connected query, which just asks, is there a path connecting the two objects?
So in this example, we have 10 objects. The command to connect objects 1 and 2, 5 and 6, 3 and 4, 8 and 3 and 4 and 9 has already been given.
Now lets say a query ” is 0 connected to 7?” arises. In this case no.
Now if there is a query on connection between 8 and 9, then yes, there is a connection. Not a direct path but via object 8 to 3, 3 to 4 and 4 to 9.
Hence on forming connections by the following commands:
Now, “are 0 and 7 connected?” yes is the answer. So that’s the problem.
Intermix union, commands and connected queries.
We need to be able to officially support those commands for a large number of objects.
Modelling the connections
We require a few abstract properties the these connections have to satisfy which are natural and intuitive criterion to meet.
So we assume that “is connected to is” an equivalence relation .i.e. every object is connected to itself, it’s symmetric.
If P’s connected to Q, then Q’s connected to P, it’s transitive. If P’s connected to Q, and Q’s connected to R, then P’s connected to R.
These properties are very intuitive, though useful to state them explicitly and make sure that the chosen algorithms maintain them.
When we have an equivalence relation, a set of objects and connections divide into subsets called connected components,
This is a maximal set of objects that is connected, mutually.
Implementation of algorithms
Okay now to implement the operations, we have to find the right query and the union command, so we are going to maintain the connected components.
The find will have to check if two objects are in the same component and the union command will have to replace components containing two objects with their union.
The example of both connection modelling and implementation has been done in dynamic-connectivity section.
The programming part
The above implementation steps, in programming world to specification of a data type, which is simply specifying the methods that we are want to going to implement in order to solve a problem.
Similar to a typical Java model, what we will do is create a class called UF that contains two methods:
- To implement union
- To implement connected, which returns a Boolean.
The constructor, takes SR unit, the number of objects, so that it can build adata structure based on the number of objects.
As we are building our logarithms, one must remember that both the number of objects can be huge, but also, the number of operations, union and connected, operations hence, our algorithms must be efficient, under such conditions.
One of the practices commonly followed, is to check API design before getting into dealing with the problem, by building a client that is going to use the data type that we develop.