### What is a Trie? How to implement Tries in C++?

This article explains the Trie data structure and provides a sample implementation in C++.- Tries are used to index and search strings in a text.
- Some examples of tries usage include, finding the longest prefix match in a routing table, auto complete of web addresses in browser etc.
- Trie is a tree where each vertex represents a word or prefix.
- The root represents an empty string.
- Markers are used to indicate end of words.
- A typical node in a trie includes the content (a char), marker for end of word and a collection of children.
- An example of trie. (Using words Hello, Balloon, Ball)

### Implementation for a Trie in C++

```
#include <iostream>
#include <vector>
using namespace std;
class Node {
public:
Node() { mContent = ' '; mMarker = false; }
~Node() {}
char content() { return mContent; }
void setContent(char c) { mContent = c; }
bool wordMarker() { return mMarker; }
void setWordMarker() { mMarker = true; }
Node* findChild(char c);
void appendChild(Node* child) { mChildren.push_back(child); }
vector<Node*> children() { return mChildren; }
private:
char mContent;
bool mMarker;
vector<Node*> mChildren;
};
class Trie {
public:
Trie();
~Trie();
void addWord(string s);
bool searchWord(string s);
void deleteWord(string s);
private:
Node* root;
};
Node* Node::findChild(char c)
{
for ( int i = 0; i < mChildren.size(); i++ )
{
Node* tmp = mChildren.at(i);
if ( tmp->content() == c )
{
return tmp;
}
}
return NULL;
}
Trie::Trie()
{
root = new Node();
}
Trie::~Trie()
{
// Free memory
}
void Trie::addWord(string s)
{
Node* current = root;
if ( s.length() == 0 )
{
current->setWordMarker(); // an empty word
return;
}
for ( int i = 0; i < s.length(); i++ )
{
Node* child = current->findChild(s[i]);
if ( child != NULL )
{
current = child;
}
else
{
Node* tmp = new Node();
tmp->setContent(s[i]);
current->appendChild(tmp);
current = tmp;
}
if ( i == s.length() - 1 )
current->setWordMarker();
}
}
bool Trie::searchWord(string s)
{
Node* current = root;
while ( current != NULL )
{
for ( int i = 0; i < s.length(); i++ )
{
Node* tmp = current->findChild(s[i]);
if ( tmp == NULL )
return false;
current = tmp;
}
if ( current->wordMarker() )
return true;
else
return false;
}
return false;
}
// Test program
int main()
{
Trie* trie = new Trie();
trie->addWord("Hello");
trie->addWord("Balloon");
trie->addWord("Ball");
if ( trie->searchWord("Hell") )
cout << "Found Hell" << endl;
if ( trie->searchWord("Hello") )
cout << "Found Hello" << endl;
if ( trie->searchWord("Helloo") )
cout << "Found Helloo" << endl;
if ( trie->searchWord("Ball") )
cout << "Found Ball" << endl;
if ( trie->searchWord("Balloon") )
cout << "Found Balloon" << endl;
delete trie;
}
```

OUTPUT:- Found Hello Found Ball Found Balloon

by far the clearest implementation of trie I saw online. helped so much for my class as my text book's implementation was very confusing

ReplyDeleteHi. Very nice implementation. I need a heavily modified version of Trie for Aho-Corasick, and then also modify Aho-Corasick but this will help me a lot since I've just began learning C++ last month at school.

ReplyDeleteI suppose that the "delete trie;" line here does not really work since the destructor of Trie is not implemented there. I think this only deletes the root of the Trie and leaves all the rest of the structure in the memory, and in the destructor it would be necessary to use DFS to clear the memory properly. Am I right?

superb implementation, clean code with nice implementation

ReplyDeleteNice work.. thanks :)

ReplyDeleteVery greatest also !

ReplyDeletethanks man, clean code, very readable :) and very useful ^_^

ReplyDeleteNice code!

ReplyDeleteDo we need the while in searchWord()? I think just a if ( current != NULL ) is sufficient.

ReplyDeleteGood work. It helped me.

ReplyDeleteHow could you implement a shortest path from one word to another?

ReplyDeleteNice implementation. I would like to suggest an additional function which I needed and I share here since it may be useful.

ReplyDeleteIt is a function to search the longest word contained in the trie, if any. Very simple indeed, but I am sharing anyway as a thanksgiving for your post.

Node *longestSubmatch(const string &s) const {

Node *current = _root;

Node *last = 0;

while (current != 0) {

for (int i = 0; i < s.length(); ++i) {

Node *tmp = current->findChild(s[i]);

if (!tmp) return last;

current = tmp;

if (current->wordMarker())

last = current;

}

}

return last;

}

Nice essay!

ReplyDeleteBut I find something that may be treated as bug.

Since `addChild()` don't check NULL nodes, there exists the following situation:

first, add some NULL nodes into `children`

second, add a not NULL node with a character, say 'b' into `children`

third, use `findChild()` to find the node that contains 'b', then comes the exception, since the loop inside the `findChild()` may use NULL nodes and call NULL nodes' `getChar()` method.

Hi,

ReplyDeleteLoved the clarity in the program.

But it suffers in efficiency due to findChild in highly populated tries.

In highly populated tries, findChild has to loop 26 times (considering only lower case letters, and worst case scenario)

The running time can go up exponentially .

I modified the program a lil bit to increase the time efficiency at the expense of space efficiency, in particular the findChild function. Have a look ( border case, NULL case checking not done)

#include

#include

#include

#include

using namespace std;

class Node{

public :

Node();

~Node() {}

char content() {return mContent;}

void setContent(char c) { mContent = c ; }

bool wordMarker() {return mMarker;}

void setWordMarker() {mMarker = true;}

Node* findChild(char c);

void setChild(Node* avi, char c);

private :

char mContent;

bool mMarker;

vector mChildren;

};

Node* arr[26] = {NULL};

Node::Node()

{

mContent = ' ';

mMarker = false;

mChildren.assign(arr, arr+26);

}

class Trie{

public :

Trie();

~Trie();

void AddWord(string s);

bool SearchWord(string s);

void deleteWord(string s);

private :

Node* root;

};

Node* Node::findChild(char c)

{

// if (mChildren[c - 97] != NULL)

return mChildren[c-97];

// else

// return NULL;

}

void Node::setChild(Node* avi,char c)

{

mChildren[c-97] = avi;

}

Trie::Trie()

{

root = new Node();

}

Trie::~Trie()

{

delete root;

}

void Trie::AddWord(string s)

{

Node* current = root;

for(int i = 0; i < s.length(); i++)

{

Node* child = current->findChild(s[i]);

if(child != NULL)

current = child;

else

{

Node* newNode = new Node();

newNode->setContent(s[i]);

current->setChild(newNode, s[i]);

current = newNode;

}

}

current->setWordMarker();

}

bool Trie::SearchWord(string s)

{

Node* current = root;

for(int i = 0 ; i < s.length(); i++)

{

if(current->findChild(s[i]) != NULL)

current = current->findChild(s[i]);

else

return false;

}

if(current->wordMarker())

return true;

else

return false;

}

int main() {

Trie *trie = new Trie();

trie->AddWord("avinash");

trie->AddWord("linux");

if(trie->SearchWord("linux"))

cout << "Word found" << endl;

else

cout << "Word not found" << endl;

}

You could use maps/hash_map instead of arrays/vector. That will make the findChild faster without hurting memory :)

ReplyDeleteThis comment has been removed by the author.

ReplyDeletePlease correct me if I'm wrong, but it looks to me that using linear search instead of direct access to find children of each node defeats the whole concept of a Trie.

DeleteThe (main) purpose of a Trie is, AFAIK, to access in O(m) worst case time a the value of a key, where m is the maximum length of a key (i.e. word).

This requires moving from node i to node i+1 accessing always the "correct" one, where "correct" means the node having as character the (i+1)-th character of the key we're looking into the Trie.

Again: please correct me if I'm wrong - I'm by no means an algorithm expert - but this implementation simply doesn't look correct.

I tend to agree with G. I thought the idea is to NOT have any content. The node position is the content. So maybe "toupper()" all characters and always have 26 child nodes. Then to find a child, well it is just mChildren[c] and the test is whether it exists or is NULL.

ReplyDeleteBut again, thanks to Mahesh. An excellent example for sure!

Hi,

ReplyDeleteWhat is the code of Auto complete word function ?

This is a good explanation, but I don't find the deleteWord() function, that is actually what would really have interested me.

ReplyDeleteAlso, I find that the searchWord() has a problem. If your trie has the ward ball in it and you want to find "balloon", your algorithm probably returns true (I didn't actually try it) even if "balloon" is not in the trie. In my opinion, the check for the marker should be out of the cycle.

YOu are wrong btw .just tested it

Deletenice

ReplyDelete