.. _centroid:

Centroid
============

In graph theory, the **centroid** of a tree refers to a vertex (or two adjacent vertices) with the smallest possible maximum branch size. In other words, removing the centroid divides the tree into subtrees where each subtree has no more than half the vertices of the original tree. For example, in a star-shaped tree, the central vertex is the centroid. In a path graph with an even number of vertices, the two middle vertices form the centroid.

**Algorithm for Finding the Centroid of a Tree:**

1. For each vertex, calculate the size of its largest branch (the maximum number of vertices in the subtrees formed by removing that vertex).
2. Select the vertex (or vertices) with the smallest maximum branch size.

The following code implements this algorithm:

.. code-block:: python

    def find_centroid(graph, n):
        # Function to find the centroid of a tree
        def dfs(u, parent):
            size[u] = 1
            max_branch = 0
            for v in graph[u]:
                if v != parent:
                    dfs(v, u)
                    size[u] += size[v]
                    max_branch = max(max_branch, size[v])
            max_branch = max(max_branch, n - size[u])
            if max_branch < min_max_branch[0]:
                min_max_branch[0] = max_branch
                centroids.clear()
                centroids.append(u)
            elif max_branch == min_max_branch[0]:
                centroids.append(u)

        size = [0] * (n + 1)
        centroids = []
        min_max_branch = [float('inf')]
        dfs(1, -1)
        return centroids

    # Create a sample tree
    tree = {
        1: [2, 3],
        2: [1, 4, 5],
        3: [1, 6],
        4: [2],
        5: [2],
        6: [3]
    }
    print(find_centroid(tree, 6))  # Output: [1]

.. image:: /images/centroid.png

Note:
    This algorithm only works for trees. For general graphs, finding a "centroid" requires different approaches. The centroid of a tree is closely related to its *balance*—a tree with one centroid is balanced around that vertex, while a tree with two centroids is balanced around the edge connecting them. Mathematically, the centroid minimizes the maximum branch size:

    \[
    \text{Centroid} = \arg\min_{v \in V} \left( \max_{T_i} |T_i| \right)
    \]

    where \( T_i \) are the subtrees formed by removing vertex \( v \).

Definition
-----------
In a tree, a vertex is called a **centroid** if removing it results in the largest connected component having at most :math:`n/2` vertices.

.. Proof of Existence and Finding the Centroid
   -------------------------------------------

First, we root the tree from a vertex. Starting from the root, at each step, if the current vertex has a child whose size exceeds :math:`n/2`, we move to that vertex. Eventually, we will reach the centroid.

The vertex we found is the centroid because its subtree size exceeds :math:`n/2` (meaning its parent component is less than :math:`n/2`), and none of its children have a size greater than :math:`n/2`.

See the algorithm implementation below.

.. code-block:: cpp

  const int N = 100000 + 77;
  int n , sz[N];
  vector < int > adj[N];

  void dfs(int v ,int prev = -1) {
     sz[v] = 1;  
     for(int u : adj[v])
        if(u != prev){
           dfs(u , v);
           sz[v] += sz[u];
         }
  }
  int FindCentroid(int v , int prev = -1) {
     for(int u : adj[v])
        if(u != prev && sz[u] * 2 > n)
           return FindCentroid(u, v);
     return v;
  }
  int main() {
     scanf("%d" , & n);
     for(int v, u, i = 1; i < n; ++i){
        scanf("%d %d" ,& v ,& u);
        adj[v].push_back(u);
        adj[u].push_back(v);
      }
     dfs(1);
     printf("%d\n" , FindCentroid(1));
     return 0;
  }

Centroid Decomposition
-----------
Suppose we want to implement a divide-and-conquer algorithm on a tree. The centroid of the tree will greatly assist us, as by removing it and solving the remaining subtrees, we can achieve a complexity of :math:`O(n \log n)`.

If you observe carefully, with this technique, we traverse each vertex at most :math:`\lg n` times. This is because each traversal reduces the size of its connected component by more than half, resulting in an overall time complexity of :math:`O(n \log n)`.

The implementation of this method is as follows: First, we find the correct centroid. Then, we solve each remaining component separately, similar to the divide-and-conquer approach. See the algorithm implementation below:

.. code-block:: cpp

  const int N = 100000 + 77;
  int n, sz[N];
  bool M[N]; // che rass haee centroid shodeand ta be hal
  vector < int > adj[N];

  void dfs(int v, int prev = -1) {
     sz[v] = 1;
     for(int u : adj[v])
        if(u != prev && ! M[u]){
           dfs(u, v);
           sz[v] += sz[u];
         }
  }

  int FindCentroid(int v, int prev = -1) {
     for(int u : adj[v])
        if(u != prev && sz[u] * 2 > n)
           return FindCentroid(u, v);
     return v;
  }
  void Decompose(int v) {
     dfs(v);
     int c = FindCentroid(v);
     M[c] = 1;
     for(int u : adj[c])
        if(! M[u])
           Decompose(u);
  }

  int main() {
     scanf("%d", & n);
     for(int v, u, i = 1; i < n; ++i){
        scanf("%d %d" , & v , & u);
        adj[v].push_back(u);
        adj[u].push_back(v);
      }
     Decompose(1);
     return 0;
  }

.. code-block:: rst

 Centroid Tree
 -----------
 Suppose we build a new tree from our existing tree using Centroid Decomposition. In each step where we find the centroid of a subtree, we set its parent in this new tree as the centroid of the previous component that contained this vertex. We call this new tree the Centroid Tree.

 .. figure:: /_static/dot/Centroid_Clusters.svg
  :figwidth: 50%
  :align: center
  :alt: Initial tree
  
  Initial tree

 .. figure:: /_static/dot/Centroid_Tree.svg
  :figwidth: 50%
  :align: center
  :alt: Centroid tree
  
  Centroid tree

 In many problems, the Centroid Tree greatly assists in computations.