Friday, June 17, 2011

What is a Trie? How to implement Tries in C++?

This article explains the Trie data structure and provides a sample implementation in C++.
  • Tries are used to index and search strings in a text.
  • Some examples of tries usage include, finding the longest prefix match in a routing table, auto complete of web addresses in browser etc.
  • Trie is a tree where each vertex represents a word or prefix.
  • The root represents an empty string.
  • Markers are used to indicate end of words.
  • A typical node in a trie includes the content (a char), marker for end of word and a collection of children.
  • An example of trie. (Using words Hello, Balloon, Ball)

Implementation for a Trie in C++

#include <iostream>
#include <vector>
using namespace std;

class Node {
public:
    Node() { mContent = ' '; mMarker = false; }
    ~Node() {}
    char content() { return mContent; }
    void setContent(char c) { mContent = c; }
    bool wordMarker() { return mMarker; }
    void setWordMarker() { mMarker = true; }
    Node* findChild(char c);
    void appendChild(Node* child) { mChildren.push_back(child); }
    vector<Node*> children() { return mChildren; }

private:
    char mContent;
    bool mMarker;
    vector<Node*> mChildren;
};

class Trie {
public:
    Trie();
    ~Trie();
    void addWord(string s);
    bool searchWord(string s);
    void deleteWord(string s);
private:
    Node* root;
};

Node* Node::findChild(char c)
{
    for ( int i = 0; i < mChildren.size(); i++ )
    {
        Node* tmp = mChildren.at(i);
        if ( tmp->content() == c )
        {
            return tmp;
        }
    }

    return NULL;
}

Trie::Trie()
{
    root = new Node();
}

Trie::~Trie()
{
    // Free memory
}

void Trie::addWord(string s)
{
    Node* current = root;

    if ( s.length() == 0 )
    {
        current->setWordMarker(); // an empty word
        return;
    }

    for ( int i = 0; i < s.length(); i++ )
    {        
        Node* child = current->findChild(s[i]);
        if ( child != NULL )
        {
            current = child;
        }
        else
        {
            Node* tmp = new Node();
            tmp->setContent(s[i]);
            current->appendChild(tmp);
            current = tmp;
        }
        if ( i == s.length() - 1 )
            current->setWordMarker();
    }
}


bool Trie::searchWord(string s)
{
    Node* current = root;

    while ( current != NULL )
    {
        for ( int i = 0; i < s.length(); i++ )
        {
            Node* tmp = current->findChild(s[i]);
            if ( tmp == NULL )
                return false;
            current = tmp;
        }

        if ( current->wordMarker() )
            return true;
        else
            return false;
    }

    return false;
}


// Test program
int main()
{
    Trie* trie = new Trie();
    trie->addWord("Hello");
    trie->addWord("Balloon");
    trie->addWord("Ball");

    if ( trie->searchWord("Hell") )
        cout << "Found Hell" << endl;

    if ( trie->searchWord("Hello") )
        cout << "Found Hello" << endl;

    if ( trie->searchWord("Helloo") )
        cout << "Found Helloo" << endl;

    if ( trie->searchWord("Ball") )
        cout << "Found Ball" << endl;

    if ( trie->searchWord("Balloon") )
        cout << "Found Balloon" << endl;

    delete trie;
}

OUTPUT:-
Found Hello
Found Ball
Found Balloon

19 comments :

  1. by far the clearest implementation of trie I saw online. helped so much for my class as my text book's implementation was very confusing

    ReplyDelete
  2. Hi. Very nice implementation. I need a heavily modified version of Trie for Aho-Corasick, and then also modify Aho-Corasick but this will help me a lot since I've just began learning C++ last month at school.
    I suppose that the "delete trie;" line here does not really work since the destructor of Trie is not implemented there. I think this only deletes the root of the Trie and leaves all the rest of the structure in the memory, and in the destructor it would be necessary to use DFS to clear the memory properly. Am I right?

    ReplyDelete
  3. superb implementation, clean code with nice implementation

    ReplyDelete
  4. Very greatest also !

    ReplyDelete
  5. thanks man, clean code, very readable :) and very useful ^_^

    ReplyDelete
  6. Do we need the while in searchWord()? I think just a if ( current != NULL ) is sufficient.

    ReplyDelete
  7. Good work. It helped me.

    ReplyDelete
  8. How could you implement a shortest path from one word to another?

    ReplyDelete
  9. Nice implementation. I would like to suggest an additional function which I needed and I share here since it may be useful.

    It is a function to search the longest word contained in the trie, if any. Very simple indeed, but I am sharing anyway as a thanksgiving for your post.

    Node *longestSubmatch(const string &s) const {
    Node *current = _root;
    Node *last = 0;
    while (current != 0) {
    for (int i = 0; i < s.length(); ++i) {
    Node *tmp = current->findChild(s[i]);
    if (!tmp) return last;
    current = tmp;
    if (current->wordMarker())
    last = current;
    }
    }
    return last;
    }

    ReplyDelete
  10. Nice essay!
    But I find something that may be treated as bug.
    Since `addChild()` don't check NULL nodes, there exists the following situation:
    first, add some NULL nodes into `children`
    second, add a not NULL node with a character, say 'b' into `children`
    third, use `findChild()` to find the node that contains 'b', then comes the exception, since the loop inside the `findChild()` may use NULL nodes and call NULL nodes' `getChar()` method.

    ReplyDelete
  11. Hi,
    Loved the clarity in the program.
    But it suffers in efficiency due to findChild in highly populated tries.
    In highly populated tries, findChild has to loop 26 times (considering only lower case letters, and worst case scenario)
    The running time can go up exponentially .

    I modified the program a lil bit to increase the time efficiency at the expense of space efficiency, in particular the findChild function. Have a look ( border case, NULL case checking not done)

    #include
    #include
    #include
    #include
    using namespace std;

    class Node{
    public :
    Node();
    ~Node() {}
    char content() {return mContent;}
    void setContent(char c) { mContent = c ; }
    bool wordMarker() {return mMarker;}
    void setWordMarker() {mMarker = true;}
    Node* findChild(char c);
    void setChild(Node* avi, char c);

    private :
    char mContent;
    bool mMarker;
    vector mChildren;

    };

    Node* arr[26] = {NULL};

    Node::Node()
    {
    mContent = ' ';
    mMarker = false;
    mChildren.assign(arr, arr+26);
    }

    class Trie{
    public :
    Trie();
    ~Trie();
    void AddWord(string s);
    bool SearchWord(string s);
    void deleteWord(string s);
    private :
    Node* root;
    };

    Node* Node::findChild(char c)
    {
    // if (mChildren[c - 97] != NULL)
    return mChildren[c-97];
    // else
    // return NULL;
    }

    void Node::setChild(Node* avi,char c)
    {
    mChildren[c-97] = avi;
    }

    Trie::Trie()
    {
    root = new Node();
    }


    Trie::~Trie()
    {
    delete root;
    }


    void Trie::AddWord(string s)
    {
    Node* current = root;
    for(int i = 0; i < s.length(); i++)
    {
    Node* child = current->findChild(s[i]);

    if(child != NULL)
    current = child;
    else
    {
    Node* newNode = new Node();
    newNode->setContent(s[i]);
    current->setChild(newNode, s[i]);
    current = newNode;
    }
    }
    current->setWordMarker();
    }


    bool Trie::SearchWord(string s)
    {
    Node* current = root;
    for(int i = 0 ; i < s.length(); i++)
    {
    if(current->findChild(s[i]) != NULL)
    current = current->findChild(s[i]);
    else
    return false;
    }
    if(current->wordMarker())
    return true;
    else
    return false;

    }
    int main() {
    Trie *trie = new Trie();
    trie->AddWord("avinash");
    trie->AddWord("linux");


    if(trie->SearchWord("linux"))
    cout << "Word found" << endl;
    else
    cout << "Word not found" << endl;
    }



    ReplyDelete
  12. You could use maps/hash_map instead of arrays/vector. That will make the findChild faster without hurting memory :)

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Please correct me if I'm wrong, but it looks to me that using linear search instead of direct access to find children of each node defeats the whole concept of a Trie.
      The (main) purpose of a Trie is, AFAIK, to access in O(m) worst case time a the value of a key, where m is the maximum length of a key (i.e. word).

      This requires moving from node i to node i+1 accessing always the "correct" one, where "correct" means the node having as character the (i+1)-th character of the key we're looking into the Trie.

      Again: please correct me if I'm wrong - I'm by no means an algorithm expert - but this implementation simply doesn't look correct.

      Delete
  14. I tend to agree with G. I thought the idea is to NOT have any content. The node position is the content. So maybe "toupper()" all characters and always have 26 child nodes. Then to find a child, well it is just mChildren[c] and the test is whether it exists or is NULL.

    But again, thanks to Mahesh. An excellent example for sure!

    ReplyDelete
  15. Hi,
    What is the code of Auto complete word function ?

    ReplyDelete
  16. This is a good explanation, but I don't find the deleteWord() function, that is actually what would really have interested me.

    Also, I find that the searchWord() has a problem. If your trie has the ward ball in it and you want to find "balloon", your algorithm probably returns true (I didn't actually try it) even if "balloon" is not in the trie. In my opinion, the check for the marker should be out of the cycle.

    ReplyDelete

Contact Form

Name

Email *

Message *

Back to Top