So, I recently met with the dev team over at http://www.lot18.com/ and they’ve inspired me to maintain this blog a great deal more. It’s been a while since I was challenged in such a way that inspires me to push forward. As a web developer, including the lot18 developers, often we do not get to utilize that fancy CS degree as our day to day really isn’t enthralled in algorithms and such.
They posed two questions at different points that can be answered with one response.
Question 1: How do you remove duplicate strings from an array of N length?
Question 2: How do you think the classic analogue T9 auto complete might have worked?
My answer?
- A tree map that utilizes a hash lookup. Theoretically O(log n )
I don’t believe they were too happy with my 30 second off the cuff answer, but it wouldn’t exactly be wrong. I had to explain what I meant and they gave me a puzzled answer. They believed that the hash generation would be too expensive.
My new answer?
- http://en.wikipedia.org/wiki/Trie A trie data structure.
A trie data structure is similar to my concept but offers a few additional benefits. When you construct the trie you automatically ensure that duplicates are removed. If you value order, while doing your insertion you can mark which index ( or just remove it ) from your array. The benefit here is the data structure itself. Once constructed, you now have a fully traversable tree in N time with a low multiplier.
This means that the tree can be constructed around your numeric pad 1 – 0 # *. Each button representing a node that can be searched at most 4 times ( 3 numbers per key, some 4 ). Each node can be assigned a weight for how many times you have favored a certain path.
giving it further thought, this was a rather brilliant way to make the most out of those old phone processors!
I’ll be writing a Trie data structure in PHP shortly and sharing it. Although it would be far from the most optimized solution ( C lib would be best ) … it would provide a nice test for creating that library later on!
( if your goal is to only remove duplicates, a sort and forward comparison works a lot better. If your goal is to remove duplicates and store strings for manipulation, trie is the best )