Suffix tree is a compressed trie of all the suffixes of a given string. Suffix trees help in solving a lot of string related problems like pattern matching, finding distinct substrings in a given string, finding longest palindrome etc. In this tutorial following points will be covered:
- Compressed Trie
- Suffix Tree Construction (Brute Force)
- Brief description of Ukkonen's Algorithm
Before going to suffix tree, let's first try to understand what a compressed trie is. Consider the following set of strings:
A standard trie for the above set of strings will look like:
And a compressed trie for the given set of strings will look like:
As it might be clear from the images show above, in a compressed trie, edges that direct to a node having single child are combined together to form a single edge and their edge labels are concatenated.
So this means that each internal node in a compressed trie has atleast two children. Also it has at most N leaves, where N is the number of strings inserted in the compressed trie. Now both the facts:
Each internal node having atleast two children, and that there are leaves, implies that there are atmost nodes in the trie. So the space complexity of a compressed trie is O(N) as compared to the O(N2) of a normal trie.
So that is one reason why to use compressed tries over normal tries.
Before going to construction of suffix trees, there is one more thing that should be understood, Implicit Suffix Tree. In Implicit suffix trees, there are atmost N leaves, while in normal one there should be exactly N leaves. The reason for atmost N leaves is one suffix being prefix of another suffix.
Following example will make it clear. Consider the string “banana”
Implicit Suffix Tree for the above string is shown in image below:
To avoid getting an Implicit Suffix Tree we append a special character that is not equal to any other character of the string. Suppose we append $ to the given string then, so the new string is “bababa$”. Now its suffix tree will be
now let's go to the construction of the suffix trees.
Suffix tree as mentioned previously is a compressed trie of all the suffixes of a given string, so the brute force approach will be to consider all the suffixes of the given string as separate strings and insert them in the trie one by one. But time complexity of the brute force approach is O(N2), and that is of no use for large values of N.