Why do we need data structure and an Example (Word Dictionary)
Cheetah , Rose , Pineapple , Orange , Lily , Cat
Does words above make any sense ?
It probably does those are words. But it’s just too random. But what if I wrote the same words like this?
Animals: Cheetah, Cat
Flowers: Rose , Lily
Fruit : Pineapple, Orange
The same words makes more sense now. Also easier to read and look for. What I did there was nothing but structuring the data.
Well data structure is , a way to keep data in a structured way. That’s how I prefer to put it. Programming is basically a way to easily manipulate and represent DATA.
If we were to manipulate data and represent it we would like to give it some structure. Just a set of random data won’t make much sense. But a structured data is easy to understand . Easy to read . Easy to search. That’s why we need DATA STRUCTURE.
Now for the example I’m not going to show Linked list or Stack or Queue. Let’s make something that’s not available already.
Example
A, AN , AND, ANT
Suppose we want to store these 4 words somewhere, What would be the easiest way to put it? Yes a list of strings would do just fine. It would look something like this on the memory
Now here’s something Interesting you can notice
Repetition!
Well for a very small set of data it would hardly matter because 1 character = 1 byte. For this case scenario we wasted 5 bytes. Ha we have GB’s of space what’s a byte or two. I do agree. But what if I want to store English dictionary? Here’s a quick google search of word count of English language.
Lets do some quick math . Lets assume on average those words have 5 letters. Now multiply it. Okay now guess how many of those words has repeating characters like the one I showed above? We might potentially be wasting 1000000 bytes there Probably more. So if we are working with a big data set we would have to think about the memory we have.
So what we need to do is make sure one character is stored only once. How do I explain.Actually lets draw it,
We can get all 4 words from here. We don’t have any repetition here.
It removed those repetitions.And we can find our word very easily. We may understand this visual design, A Computer will not. We will have to come to a common ground where machine can understand what we wish to do.
Implementation
Now I’d try to write this down as generalized as possible without favoring a single programming language.But here’s a warning for newbies , I’m not going to explain my code syntax . My target audience isn’t beginners.
I’d like to call those circles as “NODE” and each circle has only one parent but it may have zero to many(In this case 26) children.
To represent each node in code we would write them down as CLASS. And each node has a few properties.
- Parent
- List of Children
- A Character
- If it is Word[Will explain later]
I think that’s all about the information we need to represent the node.
Now we need another Class to hold these nodes. The WordDictionary class that will be the container for these nodes. We will only see the container class to add words and it will find us words later. We want the data structure to organize itself. We would like the WordDictionary class to have some simple functions such as
+AddWordToDictionary(string word):void+IsValidWord(String word):boolean+SearchByPrefix(string prefix):List<string>
Whenever we add words using that function we would like the underlying data structure to organize itself like the tree we talked about. And when we try to check the IsValidWord(String word) it will give us a boolean depending on it’s findings.
This is my simple class for WordDictionary.
And here’s a snap for Node class.
Yep that’s all there is to it for this Data Structure.
Now the magic will happen in the Initialize of the node class. Once we implement it it will automatically line up the nodes necessary. So what does this function need to do? Lets go back to the graphical images . How did we draw nodes? We added a new node as a child when we needed it. Each node will hold only one character. So the function should take the very first character of the string and push the rest of the string down to new child nodes right ? If no child node exist we need to create it , Otherwise we can just ignore it. We do it until we have no characters left to add as a child. I’ve used recursion for this example because it made more sense to me to go in a recursive manner. So you might have to do a few simulations in your head or a sheet of paper to get it in your head.
In this case I’m assuming the data pushed in is valid so the whenever I insert the last data I’m marking the node as a Word. So If I ever traverse to that node the path will always mark ONE word. This is necessary to search valid words.
The search is rather simple because since it’s a sorted data structure any mistake in the input will return false.If any character doesn’t match the node in search. It’s not a valid word so it’s pointless to search any further.
The function will only return true if every node matches the characters.
Now that’s all for the Node class. And I marked it as Internal since I don’t want it to be accessible from outside of the namespace. Whoever want’s to use the WordDictionary he will only see 3 simple functions. Just like Stack you push and pop and forget the underlying code. The WordDictionary class will handle the space issue for you.
There’s much room for improvement in the data structure. That I’ll leave for another time. This data structure has one major drawback I’ll mention it here. Although this structure will save a lot of space and make searches easier. It can’t be used to find words of “N” length in an optimized way.Why? I’l leave the readers to think about it. If you think it’s possible to do otherwise please let me know in the comments.
In case anyone’s interested in testing out the code, the code is available here.