Index Tree Hashing Problems Change Language (Farsi) Source

Tree Hashing¶

In this section, we examine checking the equality of two trees. Assume we are given two trees \(T_1\) and \(T_2\), and we want to provide a linear algorithm to check if \(T_1, T_2\) are equal. Equality here means that if we disregard vertex labels, the two trees can be drawn such that their shapes are identical.

Conversion to Equality of Two Rooted Trees¶

Assume we know that vertex \(u_1\) from tree \(T_1\) is to be equal to vertex \(u_2\) from tree \(T_2\). Then we can check the equality of two rooted trees, which seems simpler than the current problem.

So now, if we can solve the problem of equality of two \(n\)-vertex rooted trees in \(O(f(n))\), we have a solution that solves the current problem in \(O(n \times f(n))\). It is sufficient to consider \(u_1\) as a fixed vertex and iterate over all possible choices for \(u_2\).

The key idea here is to choose \(u_1, u_2\) with a specific property such that the number of vertices in the tree having this property is small. For example, if we consider \(u_1\) as a leaf, then \(u_2\) must also be a leaf. So instead of iterating \(n\) times for \(u_2\), it is sufficient to iterate a number of times equal to the number of leaves in \(T_2\).

In previous sections, we learned that every tree has at most 2 centroids. Consequently, if we consider the desired property to be 'being a centroid', it is sufficient to pick \(u_1\) as one of the centroids of tree \(T_1\) and iterate over all centroids of tree \(T_2\) for \(u_2\) (of which there are at most 2). Thus, we can solve the problem in \(O(f(n))\).

Solving the Equality Problem for Two Rooted Trees¶

Assume \(r_1\) is the root of tree \(T_1\) and \(r_2\) is the root of tree \(T_2\), and now we want to check the equality of these two trees. Initially, if the number of children of \(r_1\) and \(r_2\) are not equal, then the two trees are clearly unequal. If the number of children is equal, we need to understand which child of \(r_1\) corresponds to which child of \(r_2\). If we understand this, the problem can be solved recursively: for all children of \(r_1\), we must check the equality of their subtrees with the subtrees of their corresponding vertices in tree \(T_2\).

In other words, we need to permute the sequences of children of \(r_1\) and \(r_2\) in some way, then for each \(i\), check the equality of the subtrees of the \(i\)-th children of \(r_1\) and \(r_2\).

Now we use the idea of sorting the children of each vertex based on a specific property. In this way, we no longer need to check all permutations of children of \(r_1\) and \(r_2\); it is sufficient to check the current order.

To each rooted tree \(T\) with root \(r\), we assign a sequence of parentheses. This is done by first recursively computing the subtrees corresponding to the children of \(r\), then sorting the children of \(r\) based on the lexicographical order of their strings. This is the fixed order we wanted to assign to the children of each vertex. Finally, the parenthesis sequence of \(T\) is \(S = (S_1S_2...S_k)\), where \(S_i\)'s are the parenthesis sequences corresponding to the subtree of the \(i\)-th child of \(r\).

You can verify that two rooted trees are equal if and only if their assigned parenthesis sequences are identical.

If the user's internet is slow, this appears

Computing the Hash of a Rooted Tree¶

Since working with a string (concatenating two strings or checking string equality) requires \(O(n)\) operations, it leads us to think: instead of assigning a string to each vertex, assign a number to each vertex, where each number represents a string!

So we use the logic above, and the number corresponding to each vertex will be obtained as follows. First, we obtain the numbers of the children, then sort them, and assuming they are \(H_1, ..., H_k\), our number will be \(H = 1 + \sum H_i \times P^i\) modulo \(M\), where \(M, P\) are two random prime numbers. This technique is called hashing. Since \(M, P\) are random numbers, it can be assumed that the resulting numbers are random, and the probability of assigning identical numbers to two different trees will be very negligible. (For assurance, this can be done with more pairs of \(M, P\) to further reduce the chance of collision).

The implementation of the algorithm we described is as follows:

const int maxn = 1e5 + 10, P = 101, M = 1e9 + 9;

vector <int> v[maxn];

int calc(int u, int par = -1){
    vector<int> vec;
    for(int y : v[u]){
            if(y != par)
            vec.push_back(calc(y, u));
    }
    sort(vec.begin(), vec.end());
    int H = 0;
    for(int x : vec)
          H = (1ll * H * P + x) % M;
    H = (1 + H) % M;
    return H;
}

int main(){
    int n;
    cin >> n;
    for(int i = 0; i < n - 1; i++){
        int a, b;
        cin >> a >> b;
        v[a].push_back(b);
        v[b].push_back(a);
    }
    cout << calc(1) << "\n"; // hash of rooted tree from 1
    return 0;
}

Previous Next

This book is made by The Shaazzz contributors.

It is publicly available with cc-by-sa license.

Last updated at اکتبر 13, 2025.

Rendered by Sphinx (4.3.2) and Minoo theme (Beta 0.9.7).