=========================== 9- Spatial Trees =========================== Key Concepts ------------- - **Nearest-neighbor search** extends one-dimensional lookup to multiple dimensions. - **Grids** are limited due to memory overhead and fixed partitioning. - **Tree-based spatial partitioning** allows dynamic and adaptive division of space. Uniform Quadtrees ----------------- - A **quadtree** recursively partitions two-dimensional space into four equal quadrants. - Each **node** represents a spatial region and may have up to four children. - **Uniform** means all splits divide the space evenly at each level. **Node Structure:** :: QuadTreeNode { Boolean: is_leaf Integer: num_points Float: x_min, x_max, y_min, y_max Matrix of QuadTreeNodes: children Array of Points: points } **Point Structure:** :: Point { Float: x Float: y } Design Considerations --------------------- - Internal nodes contain metadata and pointers to child nodes. - Leaf nodes store actual data points. - Spatial bounds can be stored or computed from the root and path of splits. - Adaptive partitioning avoids excessive subdivision and reduces memory use. Building Uniform Quadtrees --------------------------- Subdivision continues until one of the following criteria is met: - Too few points to justify a split. - Spatial bounds are too small for further division. - Maximum tree depth reached. This creates a hierarchical, adaptive structure that represents spatial density efficiently. Adding Points ------------- **Insertion Algorithm Overview:** 1. Check if point lies within the quadtree’s root bounds. 2. Traverse the tree to locate the appropriate region. 3. If at a leaf: - Add the point to the node. - Check if splitting criteria are met. 4. If splitting is required: - Convert leaf to internal node. - Redistribute stored points among child quadrants. Advantages ---------- - Efficient for spatial searches and nearest-neighbor queries. - Dynamically adapts to data density. - Reduces unnecessary memory use compared to fine-grained grids. Comparison to k-d Trees ----------------------- - **Quadtrees**: Partition space uniformly into four quadrants (2D focus). - **k-d Trees**: Binary partitioning adaptable to any dimension. - k-d trees are more flexible for high-dimensional data. Removing Points ------------------ Removing a point from a quadtree is similar to insertion but more complex. - The point is deleted from the **leaf node’s list**. - The algorithm then traverses **upward**, removing unnecessary splits. - Nodes may be **collapsed** when they no longer meet the splitting criteria. Floating-point comparisons require tolerance checks using the ``approx_equal`` helper function. **Node Collapse Algorithm:** :: QuadTreeNodeCollapse(node): IF node.is_leaf: return node.points FOR each child: IF child != null: sub_pts = QuadTreeNodeCollapse(child) append sub_pts to node.points child = null node.is_leaf = True return node.points **Deletion Process:** 1. Verify the point lies within the tree’s spatial bounds. 2. Search recursively for the point. 3. When found, remove it from the leaf node. 4. Update the ``num_points`` and collapse nodes if needed. **Key Operations:** - **Recursive Search:** Finds the node containing the target point. - **Deletion:** Removes the first matching point using approximate equality. - **Collapse:** Combines children into a single leaf node if they no longer justify subdivision. Searching Uniform QuadTrees ------------------------------ The quadtree search begins at the **root node** and recursively explores potential regions that may contain a **closer neighbor** than the current candidate. **Process Overview:** 1. **Start at the root** with a dummy candidate of infinite distance. 2. **Check node compatibility:** - Determine if the node could contain a point closer than the current best. 3. **Internal Node:** Recursively search child nodes. 4. **Leaf Node:** Compare target distance to all contained points. 5. **Prune branches:** Skip subtrees that cannot contain a closer point. **Distance Calculation:** For a point ``(x, y)`` and node bounds ``[xmin, xmax] × [ymin, ymax]``: :: if x < xmin → xdist = xmin - x if xmin ≤ x ≤ xmax → xdist = 0 if x > xmax → xdist = x - xmax if y < ymin → ydist = ymin - y if ymin ≤ y ≤ ymax → ydist = 0 if y > ymax → ydist = y - ymax The **minimum distance** to the node = ``sqrt(xdist² + ydist²)``. **Search Optimization:** - **Prioritize quadrants** closest to the query point. - **Prune quadrants** where the minimum possible distance exceeds the best found distance. - **Skip empty nodes** (null children) automatically. **Result:** The nearest-neighbor search efficiently narrows down candidates, exploring only nodes that could contain closer points and pruning irrelevant regions. --- Nearest-Neighbor Search in Quadtrees ==================================== * Uses recursive search similar to other tree-based structures. * Employs *best_dist* parameter to prune branches. Helper Function: ``MinDist`` ---------------------------- Computes the minimum distance from a target point to a node’s bounding box. .. code-block:: none MinDist(node, x, y): x_dist = 0.0 IF x < node.x_min: x_dist = node.x_min - x IF x > node.x_max: x_dist = x - node.x_max y_dist = 0.0 IF y < node.y_min: y_dist = node.y_min - y IF y > node.y_max: y_dist = y - node.y_max return sqrt(x_dist*x_dist + y_dist*y_dist) Nearest-Neighbor Search Algorithm --------------------------------- .. code-block:: none QuadTreeNearestNeighbor(tree, x, y): return QuadTreeNodeNearestNeighbor(tree.root, x, y, Inf) Recursive Search ---------------- 1. **Pruning:** Skip node if ``MinDist >= best_dist``. 2. **Leaf Check:** If leaf, compare all points and update ``best_dist``. 3. **Internal Node:** * Compute child bins based on x/y coordinates. * Recursively explore closest child first. * Update ``best_candidate`` and ``best_dist`` as closer points are found. Summary ------- * Pruning reduces search space. * Recursive traversal ensures all potentially closer nodes are checked. * Allows searches for points outside tree bounds. k-d Trees ========= Overview -------- * Spatial data structure combining **binary trees** and **quadtrees**. * Splits data along one dimension at each node. * Each node divides space into **two children**. Structure ---------- .. code-block:: none KDTreeNode { is_leaf num_dimensions num_points x_min[], x_max[] split_dim, split_val left, right points[][] } KDTree { num_dimensions root } Advantages ---------- * Handles higher dimensions efficiently. * Adaptable splits based on data distribution. * Reduced branching factor (binary). Tighter Spatial Bounds ---------------------- * Improves pruning by tracking **bounding boxes of actual points**. * Bounding boxes (L, H) are computed per node: .. code-block:: none ComputeBoundingBox(pts): num_points = length(pts) num_dims = length(pts[0]) L, H = new arrays FOR each dimension d: L[d], H[d] = pts[0][d] FOR each point: update L[d], H[d] with min/max return (L, H) Building k-d Trees ================== Wrapper Function ---------------- .. code-block:: none BuildKDTree(tree, pts): Check dimensional consistency IF pts not empty: tree.root = KDTreeNode() RecursiveBuildKDTree(tree.root, tree.num_dimensions, pts) ELSE: tree.root = null Recursive Construction ---------------------- .. code-block:: none RecursiveBuildKDTree(node, num_dims, pts): node.num_points = length(pts) (node.x_min, node.x_max) = ComputeBoundingBox(pts) Compute widest dimension IF stopping condition met: node.points = pts return node.split_dim = widest dimension node.split_val = midpoint of widest dimension Partition pts into left_pts, right_pts Recursively build left and right children Stopping Conditions ------------------- * Minimum number of points. * Minimum width. * Maximum recursion depth. k-d Tree Operations =================== Insertion --------- * Choose branch using ``split_dim`` and ``split_val``. * Update bounding boxes if point extends node bounds. Deletion -------- * Locate point using split rules. * After removal, tighten node bounds and collapse internal nodes if needed. Search ------ * Prune nodes based on minimum distance to bounding box. * Distance computation (D-dimensional): .. code-block:: none KDTreeNodeMinDist(node, pt): dist_sum = 0 FOR each dimension d: diff = 0 IF pt[d] < x_min[d]: diff = x_min[d] - pt[d] IF pt[d] > x_max[d]: diff = pt[d] - x_max[d] dist_sum += diff^2 return sqrt(dist_sum) Why This Matters ================ * **Quadtrees** dynamically adapt to spatial density but scale poorly with dimensions. * **k-d Trees** balance flexibility and efficiency by splitting along one dimension. * Support powerful **nearest-neighbor searches** in high-dimensional data. * Represent a key tradeoff between memory, complexity, and performance.