10- Hash Tables

Overview

Hash tables provide a mechanism for efficient data retrieval by mapping keys to array indices using hash functions. They allow near-constant-time lookup, insertion, and deletion, at the cost of potential collisions.

Key Concepts

  • Goal: Retrieve data quickly using a key.

  • Idea: Use a mathematical function to map a key → index.

  • Tradeoff: Fast average performance but possible collisions.

Storage and Search with Keys

  • Arrays give constant-time lookup if index is known.

  • Searching without an index requires linear or binary search.

  • Keys are used to identify data records (e.g., coffee name or barcode).

  • Numeric keys can directly index arrays; non-numeric keys require conversion.

Idealized Indexing

  • One array bin per possible key → perfect retrieval.

  • Impractical due to massive memory requirements (e.g., credit card numbers).

  • Leads to idea of hashing: compressing large key spaces into small, fixed-size tables.

Hash Tables

  • Use a hash function f(k) to map a key k into a limited range [0, b−1].

  • Example: Division Method

    f(k) = k % b
    
  • Enables direct mapping of keys to bins.

  • Problem: Multiple keys can map to the same bin → collision.

Collisions

  • Occur when two keys map to the same bin.

  • Inevitable when key space > number of bins.

  • Must be handled through collision resolution strategies: - Chaining - Linear Probing

Chaining

  • Each bin stores a linked list of key-value pairs.

    HashTable {
        Integer: size
        Array of ListNodes: bins
    }
    ListNode {
        Type: key
        Type: value
        ListNode: next
    }
    
  • Lookup, insert, and delete traverse the linked list within a bin.

  • Keeps bin lists small with good hash distribution.

Advantages: - Simple, handles unlimited collisions. - Easy insertion/deletion.

Disadvantages: - Uses extra memory (linked list overhead). - Poor hash functions lead to long chains.

Linear Probing

  • Uses adjacent bins to resolve collisions.

  • If a bin is full, check the next bin until an empty one is found.

    HashTable {
        Integer: size
        Integer: num_keys
        Array of HashTableEntry: bins
    }
    HashTableEntry {
        Type: key
        Type: value
    }
    
  • Search, insert, and delete probe sequentially from the hash value.

Advantages: - No linked lists; better cache performance. - Compact memory use.

Disadvantages: - Clustering as table fills. - Deletion is complex.

Hash Functions

A good hash function should: - Be deterministic: same key → same bin. - Produce values within a defined range [0, b−1]. - Distribute keys uniformly to minimize collisions.

Common Methods: - Division method: f(k) = k % b - Horner’s method (for strings):

StringHash(String: key, Integer: size):
    total = 0
    FOR EACH character in key:
        total = CONST * total + CharacterToNumber(character)
    return total % size

Handling Non-Numeric Keys

  • Convert strings to numbers (e.g., ASCII values).

  • Horner’s method accounts for character order.

  • Prime multipliers help spread values evenly.

Example Use Case

  • Used in search algorithms (e.g., BFS, DFS) to track visited nodes.

  • Insert each visited node’s key into a hash table.

  • Before adding a new node, check if it exists in the hash table.

Why It Matters

  • Hash tables balance speed and memory.

  • Offer O(1) average-time lookup and insertion.

  • Form the foundation for data structures like Python dictionaries and sets.

  • Used widely in: - Caches - Load balancers - Database indexing - Symbol tables in compilers

Key Takeaways

  • Hash tables combine arrays + mathematical mapping.

  • Collisions are managed via chaining or probing.

  • Choice of hash function critically impacts performance.

  • Provide an elegant solution for fast, direct retrieval of data.