=========================== 10- Hash Tables =========================== Overview ======== Hash tables provide a mechanism for **efficient data retrieval** by mapping keys to array indices using **hash functions**. They allow near-constant-time lookup, insertion, and deletion, at the cost of potential **collisions**. Key Concepts ============ - **Goal:** Retrieve data quickly using a key. - **Idea:** Use a mathematical function to map a key → index. - **Tradeoff:** Fast average performance but possible collisions. Storage and Search with Keys ============================ - Arrays give constant-time lookup if index is known. - Searching without an index requires linear or binary search. - **Keys** are used to identify data records (e.g., coffee name or barcode). - Numeric keys can directly index arrays; non-numeric keys require conversion. Idealized Indexing ------------------ - One array bin per possible key → perfect retrieval. - Impractical due to **massive memory requirements** (e.g., credit card numbers). - Leads to idea of **hashing**: compressing large key spaces into small, fixed-size tables. Hash Tables =========== - Use a **hash function** `f(k)` to map a key `k` into a limited range `[0, b−1]`. - Example: **Division Method** :: f(k) = k % b - Enables direct mapping of keys to bins. - **Problem:** Multiple keys can map to the same bin → *collision*. Collisions =========== - Occur when two keys map to the same bin. - Inevitable when key space > number of bins. - Must be handled through **collision resolution** strategies: - **Chaining** - **Linear Probing** Chaining ========= - Each bin stores a **linked list** of key-value pairs. :: HashTable { Integer: size Array of ListNodes: bins } ListNode { Type: key Type: value ListNode: next } - Lookup, insert, and delete traverse the linked list within a bin. - Keeps bin lists small with good hash distribution. Advantages: - Simple, handles unlimited collisions. - Easy insertion/deletion. Disadvantages: - Uses extra memory (linked list overhead). - Poor hash functions lead to long chains. Linear Probing =============== - Uses **adjacent bins** to resolve collisions. - If a bin is full, check the next bin until an empty one is found. :: HashTable { Integer: size Integer: num_keys Array of HashTableEntry: bins } HashTableEntry { Type: key Type: value } - Search, insert, and delete probe sequentially from the hash value. Advantages: - No linked lists; better cache performance. - Compact memory use. Disadvantages: - Clustering as table fills. - Deletion is complex. Hash Functions =============== A good hash function should: - Be **deterministic**: same key → same bin. - Produce values within a defined **range** `[0, b−1]`. - Distribute keys **uniformly** to minimize collisions. Common Methods: - **Division method**: `f(k) = k % b` - **Horner’s method (for strings)**: :: StringHash(String: key, Integer: size): total = 0 FOR EACH character in key: total = CONST * total + CharacterToNumber(character) return total % size Handling Non-Numeric Keys ------------------------- - Convert strings to numbers (e.g., ASCII values). - Horner’s method accounts for character order. - Prime multipliers help spread values evenly. Example Use Case ================ - Used in **search algorithms** (e.g., BFS, DFS) to track visited nodes. - Insert each visited node’s key into a hash table. - Before adding a new node, check if it exists in the hash table. Why It Matters =============== - Hash tables balance **speed** and **memory**. - Offer **O(1)** average-time lookup and insertion. - Form the foundation for data structures like **Python dictionaries** and **sets**. - Used widely in: - Caches - Load balancers - Database indexing - Symbol tables in compilers Key Takeaways ============== - Hash tables combine **arrays + mathematical mapping**. - Collisions are managed via **chaining** or **probing**. - Choice of **hash function** critically impacts performance. - Provide an elegant solution for **fast, direct retrieval** of data.