10- Hash Tables¶
Overview¶
Hash tables provide a mechanism for efficient data retrieval by mapping keys to array indices using hash functions. They allow near-constant-time lookup, insertion, and deletion, at the cost of potential collisions.
Key Concepts¶
Goal: Retrieve data quickly using a key.
Idea: Use a mathematical function to map a key → index.
Tradeoff: Fast average performance but possible collisions.
Storage and Search with Keys¶
Arrays give constant-time lookup if index is known.
Searching without an index requires linear or binary search.
Keys are used to identify data records (e.g., coffee name or barcode).
Numeric keys can directly index arrays; non-numeric keys require conversion.
Idealized Indexing¶
One array bin per possible key → perfect retrieval.
Impractical due to massive memory requirements (e.g., credit card numbers).
Leads to idea of hashing: compressing large key spaces into small, fixed-size tables.
Hash Tables¶
Use a hash function f(k) to map a key k into a limited range [0, b−1].
Example: Division Method
f(k) = k % b
Enables direct mapping of keys to bins.
Problem: Multiple keys can map to the same bin → collision.
Collisions¶
Occur when two keys map to the same bin.
Inevitable when key space > number of bins.
Must be handled through collision resolution strategies: - Chaining - Linear Probing
Chaining¶
Each bin stores a linked list of key-value pairs.
HashTable { Integer: size Array of ListNodes: bins } ListNode { Type: key Type: value ListNode: next }
Lookup, insert, and delete traverse the linked list within a bin.
Keeps bin lists small with good hash distribution.
Advantages: - Simple, handles unlimited collisions. - Easy insertion/deletion.
Disadvantages: - Uses extra memory (linked list overhead). - Poor hash functions lead to long chains.
Linear Probing¶
Uses adjacent bins to resolve collisions.
If a bin is full, check the next bin until an empty one is found.
HashTable { Integer: size Integer: num_keys Array of HashTableEntry: bins } HashTableEntry { Type: key Type: value }
Search, insert, and delete probe sequentially from the hash value.
Advantages: - No linked lists; better cache performance. - Compact memory use.
Disadvantages: - Clustering as table fills. - Deletion is complex.
Hash Functions¶
A good hash function should: - Be deterministic: same key → same bin. - Produce values within a defined range [0, b−1]. - Distribute keys uniformly to minimize collisions.
Common Methods: - Division method: f(k) = k % b - Horner’s method (for strings):
StringHash(String: key, Integer: size): total = 0 FOR EACH character in key: total = CONST * total + CharacterToNumber(character) return total % size
Handling Non-Numeric Keys¶
Convert strings to numbers (e.g., ASCII values).
Horner’s method accounts for character order.
Prime multipliers help spread values evenly.
Example Use Case¶
Used in search algorithms (e.g., BFS, DFS) to track visited nodes.
Insert each visited node’s key into a hash table.
Before adding a new node, check if it exists in the hash table.
Why It Matters¶
Hash tables balance speed and memory.
Offer O(1) average-time lookup and insertion.
Form the foundation for data structures like Python dictionaries and sets.
Used widely in: - Caches - Load balancers - Database indexing - Symbol tables in compilers
Key Takeaways¶
Hash tables combine arrays + mathematical mapping.
Collisions are managed via chaining or probing.
Choice of hash function critically impacts performance.
Provide an elegant solution for fast, direct retrieval of data.