===========================
10- Hash Tables
===========================

Overview
========
Hash tables provide a mechanism for **efficient data retrieval** by mapping keys to array indices using **hash functions**.  
They allow near-constant-time lookup, insertion, and deletion, at the cost of potential **collisions**.

Key Concepts
============

- **Goal:** Retrieve data quickly using a key.
- **Idea:** Use a mathematical function to map a key → index.
- **Tradeoff:** Fast average performance but possible collisions.

Storage and Search with Keys
============================

- Arrays give constant-time lookup if index is known.
- Searching without an index requires linear or binary search.
- **Keys** are used to identify data records (e.g., coffee name or barcode).
- Numeric keys can directly index arrays; non-numeric keys require conversion.

Idealized Indexing
------------------
- One array bin per possible key → perfect retrieval.
- Impractical due to **massive memory requirements** (e.g., credit card numbers).
- Leads to idea of **hashing**: compressing large key spaces into small, fixed-size tables.

Hash Tables
===========

- Use a **hash function** `f(k)` to map a key `k` into a limited range `[0, b−1]`.
- Example: **Division Method**

  ::
  
      f(k) = k % b

- Enables direct mapping of keys to bins.
- **Problem:** Multiple keys can map to the same bin → *collision*.

Collisions
===========

- Occur when two keys map to the same bin.
- Inevitable when key space > number of bins.
- Must be handled through **collision resolution** strategies:
  - **Chaining**
  - **Linear Probing**

Chaining
=========

- Each bin stores a **linked list** of key-value pairs.

  ::
  
      HashTable {
          Integer: size
          Array of ListNodes: bins
      }
      ListNode {
          Type: key
          Type: value
          ListNode: next
      }

- Lookup, insert, and delete traverse the linked list within a bin.
- Keeps bin lists small with good hash distribution.

Advantages:
- Simple, handles unlimited collisions.
- Easy insertion/deletion.

Disadvantages:
- Uses extra memory (linked list overhead).
- Poor hash functions lead to long chains.

Linear Probing
===============

- Uses **adjacent bins** to resolve collisions.
- If a bin is full, check the next bin until an empty one is found.

  ::
  
      HashTable {
          Integer: size
          Integer: num_keys
          Array of HashTableEntry: bins
      }
      HashTableEntry {
          Type: key
          Type: value
      }

- Search, insert, and delete probe sequentially from the hash value.

Advantages:
- No linked lists; better cache performance.
- Compact memory use.

Disadvantages:
- Clustering as table fills.
- Deletion is complex.

Hash Functions
===============

A good hash function should:
- Be **deterministic**: same key → same bin.
- Produce values within a defined **range** `[0, b−1]`.
- Distribute keys **uniformly** to minimize collisions.

Common Methods:
- **Division method**: `f(k) = k % b`
- **Horner’s method (for strings)**:

  ::
  
      StringHash(String: key, Integer: size):
          total = 0
          FOR EACH character in key:
              total = CONST * total + CharacterToNumber(character)
          return total % size

Handling Non-Numeric Keys
-------------------------
- Convert strings to numbers (e.g., ASCII values).
- Horner’s method accounts for character order.
- Prime multipliers help spread values evenly.

Example Use Case
================

- Used in **search algorithms** (e.g., BFS, DFS) to track visited nodes.
- Insert each visited node’s key into a hash table.
- Before adding a new node, check if it exists in the hash table.

Why It Matters
===============

- Hash tables balance **speed** and **memory**.
- Offer **O(1)** average-time lookup and insertion.
- Form the foundation for data structures like **Python dictionaries** and **sets**.
- Used widely in:
  - Caches
  - Load balancers
  - Database indexing
  - Symbol tables in compilers

Key Takeaways
==============

- Hash tables combine **arrays + mathematical mapping**.
- Collisions are managed via **chaining** or **probing**.
- Choice of **hash function** critically impacts performance.
- Provide an elegant solution for **fast, direct retrieval** of data.