Index      Centroid  Problems  Change Language   Source

Centroid

In graph theory, the centroid of a tree refers to a vertex (or two adjacent vertices) with the smallest possible maximum branch size. In other words, removing the centroid divides the tree into subtrees where each subtree has no more than half the vertices of the original tree. For example, in a star-shaped tree, the central vertex is the centroid. In a path graph with an even number of vertices, the two middle vertices form the centroid.

Algorithm for Finding the Centroid of a Tree:

  1. For each vertex, calculate the size of its largest branch (the maximum number of vertices in the subtrees formed by removing that vertex).

  2. Select the vertex (or vertices) with the smallest maximum branch size.

The following code implements this algorithm:

def find_centroid(graph, n):
    # Function to find the centroid of a tree
    def dfs(u, parent):
        size[u] = 1
        max_branch = 0
        for v in graph[u]:
            if v != parent:
                dfs(v, u)
                size[u] += size[v]
                max_branch = max(max_branch, size[v])
        max_branch = max(max_branch, n - size[u])
        if max_branch < min_max_branch[0]:
            min_max_branch[0] = max_branch
            centroids.clear()
            centroids.append(u)
        elif max_branch == min_max_branch[0]:
            centroids.append(u)

    size = [0] * (n + 1)
    centroids = []
    min_max_branch = [float('inf')]
    dfs(1, -1)
    return centroids

# Create a sample tree
tree = {
    1: [2, 3],
    2: [1, 4, 5],
    3: [1, 6],
    4: [2],
    5: [2],
    6: [3]
}
print(find_centroid(tree, 6))  # Output: [1]
images/centroid.png
Note:

This algorithm only works for trees. For general graphs, finding a "centroid" requires different approaches. The centroid of a tree is closely related to its balance—a tree with one centroid is balanced around that vertex, while a tree with two centroids is balanced around the edge connecting them. Mathematically, the centroid minimizes the maximum branch size:

[ text{Centroid} = argmin_{v in V} left( max_{T_i} |T_i| right) ]

where ( T_i ) are the subtrees formed by removing vertex ( v ).

Definition

In a tree, a vertex is called a centroid if removing it results in the largest connected component having at most \(n/2\) vertices.

First, we root the tree from a vertex. Starting from the root, at each step, if the current vertex has a child whose size exceeds \(n/2\), we move to that vertex. Eventually, we will reach the centroid.

The vertex we found is the centroid because its subtree size exceeds \(n/2\) (meaning its parent component is less than \(n/2\)), and none of its children have a size greater than \(n/2\).

See the algorithm implementation below.

const int N = 100000 + 77;
int n , sz[N];
vector < int > adj[N];

void dfs(int v ,int prev = -1) {
   sz[v] = 1;
   for(int u : adj[v])
      if(u != prev){
         dfs(u , v);
         sz[v] += sz[u];
       }
}
int FindCentroid(int v , int prev = -1) {
   for(int u : adj[v])
      if(u != prev && sz[u] * 2 > n)
         return FindCentroid(u, v);
   return v;
}
int main() {
   scanf("%d" , & n);
   for(int v, u, i = 1; i < n; ++i){
      scanf("%d %d" ,& v ,& u);
      adj[v].push_back(u);
      adj[u].push_back(v);
    }
   dfs(1);
   printf("%d\n" , FindCentroid(1));
   return 0;
}

Centroid Decomposition

Suppose we want to implement a divide-and-conquer algorithm on a tree. The centroid of the tree will greatly assist us, as by removing it and solving the remaining subtrees, we can achieve a complexity of \(O(n \log n)\).

If you observe carefully, with this technique, we traverse each vertex at most \(\lg n\) times. This is because each traversal reduces the size of its connected component by more than half, resulting in an overall time complexity of \(O(n \log n)\).

The implementation of this method is as follows: First, we find the correct centroid. Then, we solve each remaining component separately, similar to the divide-and-conquer approach. See the algorithm implementation below:

const int N = 100000 + 77;
int n, sz[N];
bool M[N]; // che rass haee centroid shodeand ta be hal
vector < int > adj[N];

void dfs(int v, int prev = -1) {
   sz[v] = 1;
   for(int u : adj[v])
      if(u != prev && ! M[u]){
         dfs(u, v);
         sz[v] += sz[u];
       }
}

int FindCentroid(int v, int prev = -1) {
   for(int u : adj[v])
      if(u != prev && sz[u] * 2 > n)
         return FindCentroid(u, v);
   return v;
}
void Decompose(int v) {
   dfs(v);
   int c = FindCentroid(v);
   M[c] = 1;
   for(int u : adj[c])
      if(! M[u])
         Decompose(u);
}

int main() {
   scanf("%d", & n);
   for(int v, u, i = 1; i < n; ++i){
      scanf("%d %d" , & v , & u);
      adj[v].push_back(u);
      adj[u].push_back(v);
    }
   Decompose(1);
   return 0;
}
Centroid Tree
-----------
Suppose we build a new tree from our existing tree using Centroid Decomposition. In each step where we find the centroid of a subtree, we set its parent in this new tree as the centroid of the previous component that contained this vertex. We call this new tree the Centroid Tree.

.. figure:: /_static/dot/Centroid_Clusters.svg
 :figwidth: 50%
 :align: center
 :alt: Initial tree

 Initial tree

.. figure:: /_static/dot/Centroid_Tree.svg
 :figwidth: 50%
 :align: center
 :alt: Centroid tree

 Centroid tree

In many problems, the Centroid Tree greatly assists in computations.