Technical Articles

Hashing Methods

Perfect Hashing:
A hash function which maps each different key to a distinct integer. Usually all possible keys must be known beforehand. A hash table which uses a perfect hash has no collisions. Perfect hashing functions can be found only under certain conditons. One application of the perfect hash function is static dictionary. Used in compilers.

Coalesced Hashing:
An array which is indexed by a hash function. An item whose hash value is occupied (a collision) is put in the next empty place in the array, and added to the end of a list embedded in the array items.

Open Addressing:
A general collision resolution scheme for a hash table in which all items are stored within the table. In case of collision, other positions are computed and checked (using a probe sequence) until an empty position is found. Some ways of computing possible new positions may lead to less efficient operation because of clustering.

    Linear Probing:
    A hash table in which a collision is resolved by putting the item in the next empty place in the array following the occupied place. With an even moderate load factor, primary clustering tends to slow retrieval.

    Double Hashing:
    A method of open addressing for a hash table in which a collision is resolved by searching the table for an empty place at intervals given by a different hash function, thus minimizing clustering. Since a different hashing function is used to find a location in case of collision, colliding values should be spread out. In linear probing, primary clustering occurs when collisions fill up every space for long stretches. Even in quadratic probing, secondary clustering may develop since colliding values follow the same probe sequence.

    Quadratic Probing:
    A method of open addressing for a hash table in which a collision is resolved by putting the item in the next empty place given by a probe sequence. The space between places in the sequence increases quadratically. Deletion may be hard because finding collisions again relies on not creating empty spots. One solution is to mark an entry as deleted so it can be reused for insertion, but the search list is intact.

    Uniform Hashing:
    A conceptual method of open addressing for a hash table. A collision is resolved by putting the item in the next empty place given by a probe sequence which is independent of sequences for all other key. Since the probe sequences are independent, this is clustering free.

Hashing With Chaining:
Hashing with chaining is an application of linked lists and gives an approach to collision resolution. In hashing with chaining, the hash table contains linked lists of elements or pointers to elements. The lists are referred to as chains, and the technique is called chaining. This is a common technique where a straightforward implementation is desired and maximum efficiency isn't required. Each linked list contains all the elements whose keys hash to the same index. Using chains minimizes search by dividing the set to be searched into lots of smaller pieces. There's nothing inherently wrong with linear search with small sequences; it's just that it gets slower as the sequences to be searched get longer. In this approach to resolving collisions, each sequence of elements whose keys hash to the same value will stay relatively short, so linear search is adequate.

    Separate Chaining Hashing:
    An array which is indexed by a hash function. All subsequent items with the same hash value as an existing one (a collision) are put in a single external list. The items in the list may be searched and maintained with any of the list search algorithms.

    Direct Chaining Hashing:
    An array of lists which is indexed by a hash function. Each list hold all items with the same hash value (collisions). The items in the list may be searched and maintained with any of the list search algorithms.