The implementation of a map as a binary search tree should be fairly obvious: simply store the value and key in the node, and sort on the key:
struct TreeMapNode { KeyType key; ValueType value; TreeMapNode *left, *right; }; However, other implementations are possible...
We were able to insert, remove, and find elements with our various list data structures. Any vector that provides the above operations serves as a map. What is the disadvantage? (What is the cost of the find() operation?)
On the other hand, if our key type is int, we have a very natural implementation of a map as a vector... what is it? What is the space overhead for this scheme?
Hashing
We cannot use arrays in general to implement a map for two reasons. First, most of the time the key domain will not consist of integers only. Second, even if the key domain were the integers, we do not want to spend the memory necessary to handle all possible keys directly.
Hash tables can be viewed as a way of overcoming these limitations. The idea:
- Use the array locations as buckets containing lists of pointers.
- Define a "hash function" that maps each element of the key domain to an index into the array.
Why is this good?
Given a good hash function, the buckets will each contain less than some constant number of elements, giving O(1)---that is, constant time---lookup for a key.
The size of the hash table will be only some constant factor larger than the number of items that we expect to store in the array.
With a good hash function, in fact, even for moderately sized data sets (say, a few thousand elements), a hash table can provide a much faster implementation of a map than a tree can.
Most modern languages provide some form of hash table in the standard library. C++ currently does not define a standard hash_map, but most vendors provide one and a future revision of the language may include one.