| Basic data for all hash-based classes.
The classes in fastutil are built around open-addressing hashing
implemented via double hashing. Following Knuth's suggestions in the third volume of The Art of Computer
Programming, we use for the table size a prime p such that
p-2 is also prime. In this way hashing is implemented with modulo p,
and secondary hashing with modulo p-2.
Entries in a table can be in three states:
Hash.FREE ,
Hash.OCCUPIED or
Hash.REMOVED .
The naive handling of removed entries requires that you search for a free entry as if they were occupied. However,
fastutil implements two useful optimizations, based on the following invariant:
Let i0, i1, &hellip, ip-1 be
the permutation of the table indices induced by the key k, that is, i0 is the hash
of k and the following indices are obtained by adding (modulo p) the secondary hash plus one.
If there is a
Hash.OCCUPIED entry with key k, its index in the sequence above comes before
the indices of any
Hash.REMOVED entries with key k.
When we search for the key k we scan the entries in the
sequence i0, i1, &hellip,
ip-1 and stop when k is found,
when we finished the sequence or when we find a
Hash.FREE entry. Note
that the correctness of this procedure it is not completely trivial. Indeed,
when we stop at a
Hash.REMOVED entry with key k we must rely
on the invariant to be sure that no
Hash.OCCUPIED entry with the same
key can appear later. If we insert and remove frequently the same entries,
this optimization can be very effective (note, however, that when using
objects as keys or values deleted entries are set to a special fixed value to
optimize garbage collection).
Moreover, during the probe we keep the index of the first
Hash.REMOVED entry we meet.
If we actually have to insert a new element, we use that
entry if we can, thus avoiding to pollute another
Hash.FREE entry. Since this position comes
a fortiori before any
Hash.REMOVED entries with the same key, we are also keeping the invariant true.
|