======================= 12- B-Trees ======================= **Overview** B-trees are self-balancing, multi-way search trees designed to minimize costly data retrieval operations from slow storage (e.g., disks or remote servers). By storing multiple keys per node, B-trees reduce the number of block reads needed to locate or insert data. **Key Idea** - Accessing data **within a block** (node) is *cheap*. - Accessing **new blocks** (loading from disk/network) is *expensive*. - B-trees pack multiple keys into each node to maximize data retrieved per access. -------------------------- B-Tree Structure -------------------------- **Definition** - Each node contains multiple *keys* and *child pointers*. - Nodes maintain between *k* and *2k* keys (except the root). - Internal nodes have between *k + 1* and *2k + 1* children. - All leaves exist at the same depth (perfectly balanced). **Formal Structure** :: BTree { BTreeNode: root Integer: k } BTreeNode { Integer: k Integer: size Boolean: is_leaf Array of Type: keys Array of BTreeNodes: children } **Mapping** - `children[i]` → keys < `keys[i]` - `children[i+1]` → keys > `keys[i]` **Analogy** - Each *node* = a “binder” of index cards. - Each *key* = an index card (with data + pointer). - Pointers between binders define ordering of subtrees. -------------------------- Searching in a B-Tree -------------------------- **Goal:** Locate a key or return null if not found. **Algorithm** :: BTreeSearch(tree, target): return BTreeNodeSearch(tree.root, target) BTreeNodeSearch(node, target): i = 0 WHILE i < node.size AND target >= node.keys[i]: IF target == node.keys[i]: return node.keys[i] i = i + 1 IF node.is_leaf: return null return BTreeNodeSearch(node.children[i], target) **Runtime** - Search cost: *O(logₖ N)* node accesses. - Each node search: *O(k)* (linear or binary search). -------------------------- Adding Keys -------------------------- **Insertion Rules** 1. Nodes may store between *k* and *2k* keys. 2. If a node exceeds *2k* keys, it must be **split**. 3. Splitting ensures all leaves remain at equal depth. **Helper Functions** :: BTreeNodeIsOverFull(node): return node.size == (2 * node.k + 1) **Insert into Non-Full Node** :: BTreeNodeAddKey(node, key, next_child): i = node.size - 1 WHILE i >= 0 AND key < node.keys[i]: node.keys[i+1] = node.keys[i] IF NOT node.is_leaf: node.children[i+2] = node.children[i+1] i = i - 1 node.keys[i+1] = key IF NOT node.is_leaf: node.children[i+2] = next_child node.size += 1 **Splitting Overfull Nodes** :: BTreeNodeSplit(node, child_index): old_child = node.children[child_index] new_child = BTreeNode(node.k) split_index = Floor(old_child.size / 2) split_key = old_child.keys[split_index] -- Move upper half of keys/children to new_child -- BTreeNodeAddKey(node, split_key, new_child) **Recursive Insertion** :: BTreeNodeInsert(node, key): i = 0 WHILE i < node.size AND key >= node.keys[i]: IF key == node.keys[i]: Update data return i = i + 1 IF node.is_leaf: BTreeNodeAddKey(node, key, null) ELSE: BTreeNodeInsert(node.children[i], key) IF BTreeNodeIsOverFull(node.children[i]): BTreeNodeSplit(node, i) **Root Splitting** :: BTreeInsert(tree, key): BTreeNodeInsert(tree.root, key) IF BTreeNodeIsOverFull(tree.root): new_root = BTreeNode(tree.k) new_root.is_leaf = False new_root.children[0] = tree.root BTreeNodeSplit(new_root, 0) tree.root = new_root **Complexity** - Depth of tree: *O(logₖ N)* - Total cost: *O(k × logₖ N)* - B-tree always remains balanced. -------------------------- Key Insights -------------------------- - B-trees are optimized for *block access efficiency*. - Each node access retrieves multiple keys at once. - Splitting maintains balance dynamically during inserts. - Perfect for indexing large, disk-based or distributed data sets. Examples of Adding Keys ======================= **Basic Insertion:** - If the target leaf is **not full**, insert the key in sorted order. - No splitting required. Example: Adding key 30 to a non-full leaf. **Splitting a Full Leaf:** - When a leaf is **overfull** after insertion: - Identify **split point** (middle key). - **Promote** middle key to parent. - Use ``BTreeNodeSplit`` to divide the leaf into two siblings. - If the parent becomes overfull, split it recursively. **Splitting the Root:** - When the root overfills: - Split root into two children. - Promote middle key to a new root. - The new root always starts with **one key**. - Tree height increases by one. **Complexity:** - Only nodes along the insertion path are modified. - Total operations scale with ``O(log_k N)``. Removing Keys ============== **Overview:** - Search for the target key. - Remove it, then **repair under-full nodes** to maintain balance. - Root may shrink, decreasing overall depth by one. Fixing Under-Full Nodes ======================= **Under-Full Check:** :: BTreeNodeIsUnderFull(node): return node.size < node.k **Two Repair Strategies:** 1. **Merge** two small sibling nodes. 2. **Transfer** a key from a larger sibling. --- **1. Merge Operation (Listing 12-1):** - Merge left and right siblings when combined keys < ``2k``. - Steps: - Copy separating key from parent to left child. - Append right child's keys and children. - Remove separating key and right child from parent. - May cause parent to become under-full. **2. Transfer Operation:** - Used when siblings together have ≥ ``2k`` keys. **Transfer Left (Listing 12-2):** - Move one key from right sibling to left sibling. - Replace separating key in parent with sibling’s key. **Transfer Right (Listing 12-3):** - Move one key from left sibling to right sibling. - Replace parent’s separating key with sibling’s last key. **Repair Wrapper (BTreeNodeRepairUnderFull):** - Chooses between ``merge`` or ``transfer`` based on key totals. --- Finding the Minimum Key ======================= **Helper Function (Listing 12-4):** :: BTreeNodeFindMin(node): if node.size == 0: return null if node.is_leaf: return node.keys[0] else: return BTreeNodeFindMin(node.children[0]) - Used during deletion to find the smallest key in a subtree. Deletion Algorithm ================== **Top-Level Function:** :: BTreeDelete(tree, key): BTreeNodeDelete(tree.root, key) if root is empty and not leaf: root = root.children[0] **Recursive Function (BTreeNodeDelete):** 1. **Search** for key. 2. If node is a **leaf**, remove directly and shift keys. 3. If node is **internal**: - Replace key with smallest key from right subtree. - Recursively delete replacement key. 4. After deletion, **check for under-full children** and repair. **Complexity:** - One top-down pass, at most one sibling access per level. - Total operations: ``O(log_k N)``. Examples of Removing Keys ========================= **1. Removing from Leaf:** - If leaf remains ≥ ``k`` keys → no repair needed. **2. Removing from Internal Node:** - Replace with next key in sorted order (from child). **3. Merging Nodes:** - If a child becomes under-full → merge with sibling. **4. Key Transfer:** - Borrow a key from a sibling to restore balance. **5. Removing a Level:** - If root becomes empty → promote its single child to root. --- Why This Matters ================ **Key Concepts:** - B-trees minimize **slow memory accesses** by grouping keys in nodes. - Each non-root node: - Holds between ``k`` and ``2k`` keys. - Ensures branching factor ≥ ``k+1``. - Maintains **balance** and **efficient operations** across large data sets. **Advantages:** - Balanced structure ensures search, insert, and delete in ``O(log N)``. - Nodes remain at least half full (except root). - Efficient for **disk-based** or **external memory** operations. **Design Tradeoffs:** - General-purpose and adaptable. - Balances flexibility and efficiency versus specialized indexing. **Key Takeaway:** - B-trees dynamically rebalance through **splitting, merging, and transferring**, ensuring consistent performance even for large, uneven datasets.