====================
8- GRIDS
====================

-------------------------
Nearest-Neighbor Search
-------------------------
**Goal:** Find the data point closest to a target.  

**Formal Definition:**
Given data points :math:`X = {x_1, x_2, ..., x_N}`, target :math:`x'`, and distance function :math:`dist(x, y)`,  
find :math:`x_i` that minimizes :math:`dist(x', x_i)`.

**Difference from Binary Search:**  
- Binary search looks for an exact match.  
- NNS finds the closest match.

**Uses:** Spatial search (nearest location), similarity search (closest value or string).

--------------------
Linear Scan for NNS
--------------------
**Idea:**  
Check every point, compute its distance to the target, and keep the closest.

**Algorithm (Simplified):**
.. code-block:: none

    best = A[0]
    best_dist = dist(target, best)
    for each point in A:
      if dist(target, point) < best_dist:
        best = point
    return best


- Works for 1D or multidimensional data.  
- **Complexity:** O(N) time, O(1) space.  
- Simple but inefficient for large datasets.

--------------------
Spatial Data
--------------------
- 2D data represented as points: ``(x, y)``.  
- Use **Euclidean distance**:  
  :math:`dist = sqrt((x_1 - x_2)^2 + (y_1 - y_2)^2)`  
- Sorting by a single dimension (x or y) doesn’t work for 2D nearest-neighbor search.  
- Must consider all dimensions.

--------------------
Grids
--------------------
**Purpose:** Efficiently organize and search 2D data.  
- Divide space into **bins (cells)** indexed by (xbin, ybin).  
- Each bin holds multiple points.

**Mapping points to bins:**
.. math::

    x_{bin} = Floor((x - x_{start}) / x_{bin\_width})  
    y_{bin} = Floor((y - y_{start}) / y_{bin\_width})

**Grid Structure (simplified):**
.. code-block:: text

    Grid 
    {
        num_x_bins, num_y_bins
        x_start, x_end, y_start, y_end
        x_bin_width, y_bin_width
        bins (2D array of lists)
    }


**GridPoint:**
.. code-block:: text

    GridPoint { Float x, Float y, GridPoint next }

--------------------
Grid Operations
--------------------

**Insert Point:**
1. Compute bin indices (xbin, ybin).  
2. Check bounds.  
3. Add new point to front of bin’s linked list.  

**Delete Point:**
1. Locate bin.  
2. Traverse linked list.  
3. Remove first matching point (using approximate equality).  

**Approximate Equality:**
.. code-block:: text

    approx_equal(x1, y1, x2, y2):
        return abs(x1 - x2) <= threshold and abs(y1 - y2) <= threshold

--------------------
Grid-Based Search
--------------------
**Goal:** Speed up nearest-neighbor search by ignoring distant bins.  
- Once a **best distance** is found, **prune bins** that cannot contain closer points.

**Distance to a Bin:**
Compute the minimum distance from the target to the bin’s boundary:  
If this minimum > best distance, skip the bin.

**Helper Function:**
.. code-block:: text

    MinDistToBin(g, xbin, ybin, x, y):
        if out of range: return Inf
        compute x_min, x_max, y_min, y_max
        find shortest (x, y) distance to bin edges
        return sqrt(x_dist^2 + y_dist^2)

--------------------
Key Takeaways
--------------------
- **Nearest-Neighbor Search** finds closest matches, not exact ones.  
- **Linear Scan** is simple but slow.  
- **Grids** divide space into regions, improving search efficiency.  
- **Bins** hold multiple points (linked lists).  
- **Pruning** reduces unnecessary checks.  
- Foundation for advanced spatial structures (e.g., k-d trees, quadtrees).

--------------------------
Helper: MinDistToBin
--------------------------
**Purpose:** Compute the shortest distance from a target point to a grid bin.

**Logic:**
1. Check if bin indices are valid — return `Inf` if invalid.
2. Compute bin’s min/max bounds along x and y.
3. Compare target’s coordinates to bin bounds.
4. Return shortest Euclidean distance to bin edge.

**Note:**  
Invalid bins return `Inf` for pruning safety, though error-throwing may be preferable.  
Analogy: *Neighbor throws the ball just far enough to reach our yard.*

--------------------------
Linear Scan Over Bins
--------------------------
**Algorithm:** `GridLinearScanNN(g, x, y)`  
1. Initialize `best_dist = Inf`, `best_candidate = null`.  
2. Loop over all bins `(xbin, ybin)`.  
3. Skip bins if `MinDistToBin(...) >= best_dist`.  
4. For each valid bin, check all points (linked list).  
5. Update best candidate if a closer point is found.  
6. Return nearest neighbor.

**Pros:** Simple and clear.  
**Cons:** Inefficient on large or sparse grids.  
**Supports:** Targets inside or outside grid bounds.  
**Optimization:** Prunes bins with min distance > best found distance.

--------------------------
Expanding Search
--------------------------
**Idea:** Prioritize checking bins *closest* to the target.  
- Start with the bin containing the target point.  
- Expand outward to nearby bins (spiral/diamond pattern).  
- Stop once all unvisited bins are farther than current best point.

**Analogy:** Searching for lost keys—start nearby, expand outward.

--------------------------
Helper: GridCheckBin
--------------------------
**Purpose:** Return the closest point in a given bin within a distance threshold.

**Steps:**
1. Verify valid bin indices (return `null` if invalid).  
2. Initialize `best_dist = threshold`.  
3. Scan all points in bin; compute Euclidean distances.  
4. Return closest point if distance < threshold, else `null`.

--------------------------
GridSearchExpanding
--------------------------
**Algorithm:** Expanding nearest-neighbor search.

1. Compute starting bin `(xb, yb)` for target; clamp to valid grid range.  
2. Set `steps = 0`, `explore = True`.  
3. While `explore`:
   - For each horizontal offset `xoff`, compute `yoff`.
   - Use `MinDistToBin()` to prune distant/invalid bins.
   - Use `GridCheckBin()` to find closer points.
   - Update `best_pt` and `best_d` if improved.
   - Continue until no bins within threshold remain.

**Tradeoff:**  
+ Avoids checking distant bins → efficient.  
− More complex control logic (spiral order, bounds, stopping criteria).

--------------------------
Grid Size Tradeoffs
--------------------------
- **Large bins:** Fewer bins, more points per bin → slower within-bin scans.  
- **Small bins:** Many empty bins → wasted checks.  
- **Optimal bin size:** Depends on data density and distribution.  
- **Non-uniform grids:** Adapt bin size dynamically.

--------------------------
Beyond 2D & Spatial Data
--------------------------
**Scaling to Higher Dimensions:**
- For D dimensions with K bins each → `K^D` bins (memory cost grows fast).
- Inefficient for high-D spaces (many empty bins).


**Generalized Euclidean Distance:**
.. math::
   dist = sqrt(sum(w_d * (x_i[d] - x_j[d])^2))

**Applications:**
- Spatial: nearest store, location-based search.  
- Non-spatial: similarity search (e.g., coffee attributes: strength, acidity).  
- Use weighted distances to emphasize important dimensions.

--------------------------
Key Takeaways
--------------------------
- **MinDistToBin:** Enables pruning by measuring bin proximity.  
- **GridLinearScanNN:** Simple baseline using full scan.  
- **Expanding Search:** Prioritizes closer bins → efficient pruning.  
- **Grid Size:** Critical for performance balance.  
- **Beyond 2D:** High dimensions increase complexity and memory.  
- **Generalization:** Nearest-neighbor search applies to non-spatial data.  
- Grids introduce **multi-value bins** (linked lists), laying groundwork for trees and adaptive structures.