# Hash Tables

• Intuitively, a map M supports the abstraction of using keys as indices such as M[k]
• A map with n keys that are known to be integers in a fixed range is just an array
• A hash function can map general keys (ie not integers) to corresponding indices in a table/array

## Hash Functions

A hash function maps keys of a given type to integers in a fixed interval .

• A very simple hash function is the mod function:

• Works for integer keys
• The integer is the hash value of the key
• The goal of a hash function is to store an entry at index

• Function usually has two components:

• Hash code
• keys -> integers
• Compression function
• integers -> integers in range
• Hash code applied first, then compression - Some example hash functions:

• Use the memory address of the object as it's hash code
• Integer cast

• Interpret the bits of the key as an integer
• Only suitable with 64 bits
• Component sum

• Partition they key into bitwise components of fixed length and sum the components
• Polynomial accumulation

• Partition the bits of the key into a sequence of components of fixed length , , ... ,
• Evaluate the polynomial for some fixed value
• Especially suitable for strings
• Polynomial can be evaluated in time as

Some example compression functions:

• Division
• The size is usually chosen to be a prime to increase performance
• and are nonnegative integers such that

## Collision Handling

Collisions occur when different keys hash to the same cell. There are several strategies for resolving collisions.

### Separate Chaining

With separate chaining, each cell in the map points to another map containing all the entries for that cell.

### Linear Probing

• The colliding item is placed in a different cell of the table
• Linear probing handles collisions by placing the colliding item at the next available table cell
• Each table cell inspected is referred to as a "probe"
• Colliding items can lump together, causing future collisions to cause a longer sequence of probes

Consider a hash table that uses linear probing.

• get(k)
• Start at cell
• Prove consecutive locations until either
• Key is found
• Empty cell is found
• All cells have been unsuccessfully probed
• To handle insertions and deletions, need to introduce a special marker object defunct which replaces deleted elements
• remove(k)
• Search for an entry with key k
• If an entry (k, v) is found, replace it with defunct and return v
• Else, return null

### Double Hashing

• Double hashing uses two hash functions h() and f()
• If cell h(k) already occupied, tries sequentially the cell for
• f(k) cannot return zero
• Table size must be a prime to allow probing of all cells
• Common choice of second hash func is where q is a prime
• if then we have linear probing

## Performance

• In the worst case, operations on hash tables take time when the table is full and all keys collide into a single cell
• The load factor affects the performance of a hash table
• = number of entries
• = number of cells
• When is large, collision is likely
• Assuming hash values are true random numbers, the "expected number" of probes for an insertion with open addressing is
• However, in practice, hashing is very fast and operations have performance, provided is not close to 1