12- B-Trees

Overview

B-trees are self-balancing, multi-way search trees designed to minimize costly data retrieval operations from slow storage (e.g., disks or remote servers). By storing multiple keys per node, B-trees reduce the number of block reads needed to locate or insert data.

Key Idea

  • Accessing data within a block (node) is cheap.

  • Accessing new blocks (loading from disk/network) is expensive.

  • B-trees pack multiple keys into each node to maximize data retrieved per access.

B-Tree Structure

Definition

  • Each node contains multiple keys and child pointers.

  • Nodes maintain between k and 2k keys (except the root).

  • Internal nodes have between k + 1 and 2k + 1 children.

  • All leaves exist at the same depth (perfectly balanced).

Formal Structure

BTree {
    BTreeNode: root
    Integer: k
}

BTreeNode {
    Integer: k
    Integer: size
    Boolean: is_leaf
    Array of Type: keys
    Array of BTreeNodes: children
}

Mapping

  • children[i] → keys < keys[i]

  • children[i+1] → keys > keys[i]

Analogy

  • Each node = a “binder” of index cards.

  • Each key = an index card (with data + pointer).

  • Pointers between binders define ordering of subtrees.

Searching in a B-Tree

Goal: Locate a key or return null if not found.

Algorithm

BTreeSearch(tree, target):
    return BTreeNodeSearch(tree.root, target)

BTreeNodeSearch(node, target):
    i = 0
    WHILE i < node.size AND target >= node.keys[i]:
        IF target == node.keys[i]:
            return node.keys[i]
        i = i + 1

    IF node.is_leaf:
        return null
    return BTreeNodeSearch(node.children[i], target)

Runtime

  • Search cost: O(logₖ N) node accesses.

  • Each node search: O(k) (linear or binary search).

Adding Keys

Insertion Rules

  1. Nodes may store between k and 2k keys.

  2. If a node exceeds 2k keys, it must be split.

  3. Splitting ensures all leaves remain at equal depth.

Helper Functions

BTreeNodeIsOverFull(node):
    return node.size == (2 * node.k + 1)

Insert into Non-Full Node

BTreeNodeAddKey(node, key, next_child):
    i = node.size - 1
    WHILE i >= 0 AND key < node.keys[i]:
        node.keys[i+1] = node.keys[i]
        IF NOT node.is_leaf:
            node.children[i+2] = node.children[i+1]
        i = i - 1
    node.keys[i+1] = key
    IF NOT node.is_leaf:
        node.children[i+2] = next_child
    node.size += 1

Splitting Overfull Nodes

BTreeNodeSplit(node, child_index):
    old_child = node.children[child_index]
    new_child = BTreeNode(node.k)
    split_index = Floor(old_child.size / 2)
    split_key = old_child.keys[split_index]
    -- Move upper half of keys/children to new_child --
    BTreeNodeAddKey(node, split_key, new_child)

Recursive Insertion

BTreeNodeInsert(node, key):
    i = 0
    WHILE i < node.size AND key >= node.keys[i]:
        IF key == node.keys[i]:
            Update data
            return
        i = i + 1
    IF node.is_leaf:
        BTreeNodeAddKey(node, key, null)
    ELSE:
        BTreeNodeInsert(node.children[i], key)
        IF BTreeNodeIsOverFull(node.children[i]):
            BTreeNodeSplit(node, i)

Root Splitting

BTreeInsert(tree, key):
    BTreeNodeInsert(tree.root, key)
    IF BTreeNodeIsOverFull(tree.root):
        new_root = BTreeNode(tree.k)
        new_root.is_leaf = False
        new_root.children[0] = tree.root
        BTreeNodeSplit(new_root, 0)
        tree.root = new_root

Complexity

  • Depth of tree: O(logₖ N)

  • Total cost: O(k × logₖ N)

  • B-tree always remains balanced.

Key Insights

  • B-trees are optimized for block access efficiency.

  • Each node access retrieves multiple keys at once.

  • Splitting maintains balance dynamically during inserts.

  • Perfect for indexing large, disk-based or distributed data sets.

Examples of Adding Keys

Basic Insertion: - If the target leaf is not full, insert the key in sorted order. - No splitting required.

Example: Adding key 30 to a non-full leaf.

Splitting a Full Leaf: - When a leaf is overfull after insertion:

  • Identify split point (middle key).

  • Promote middle key to parent.

  • Use BTreeNodeSplit to divide the leaf into two siblings.

  • If the parent becomes overfull, split it recursively.

Splitting the Root: - When the root overfills:

  • Split root into two children.

  • Promote middle key to a new root.

  • The new root always starts with one key.

  • Tree height increases by one.

Complexity: - Only nodes along the insertion path are modified. - Total operations scale with O(log_k N).

Removing Keys

Overview: - Search for the target key. - Remove it, then repair under-full nodes to maintain balance. - Root may shrink, decreasing overall depth by one.

Fixing Under-Full Nodes

Under-Full Check:

BTreeNodeIsUnderFull(node):
    return node.size < node.k

Two Repair Strategies: 1. Merge two small sibling nodes. 2. Transfer a key from a larger sibling.

1. Merge Operation (Listing 12-1): - Merge left and right siblings when combined keys < 2k. - Steps:

  • Copy separating key from parent to left child.

  • Append right child’s keys and children.

  • Remove separating key and right child from parent.

  • May cause parent to become under-full.

2. Transfer Operation: - Used when siblings together have ≥ 2k keys.

Transfer Left (Listing 12-2): - Move one key from right sibling to left sibling. - Replace separating key in parent with sibling’s key.

Transfer Right (Listing 12-3): - Move one key from left sibling to right sibling. - Replace parent’s separating key with sibling’s last key.

Repair Wrapper (BTreeNodeRepairUnderFull): - Chooses between merge or transfer based on key totals.

Finding the Minimum Key

Helper Function (Listing 12-4):

BTreeNodeFindMin(node):
    if node.size == 0: return null
    if node.is_leaf: return node.keys[0]
    else: return BTreeNodeFindMin(node.children[0])
  • Used during deletion to find the smallest key in a subtree.

Deletion Algorithm

Top-Level Function:

BTreeDelete(tree, key):
    BTreeNodeDelete(tree.root, key)
    if root is empty and not leaf:
        root = root.children[0]

Recursive Function (BTreeNodeDelete): 1. Search for key. 2. If node is a leaf, remove directly and shift keys. 3. If node is internal:

  • Replace key with smallest key from right subtree.

  • Recursively delete replacement key.

  1. After deletion, check for under-full children and repair.

Complexity: - One top-down pass, at most one sibling access per level. - Total operations: O(log_k N).

Examples of Removing Keys

1. Removing from Leaf: - If leaf remains ≥ k keys → no repair needed.

2. Removing from Internal Node: - Replace with next key in sorted order (from child).

3. Merging Nodes: - If a child becomes under-full → merge with sibling.

4. Key Transfer: - Borrow a key from a sibling to restore balance.

5. Removing a Level: - If root becomes empty → promote its single child to root.

Why This Matters

Key Concepts: - B-trees minimize slow memory accesses by grouping keys in nodes. - Each non-root node:

  • Holds between k and 2k keys.

  • Ensures branching factor ≥ k+1.

  • Maintains balance and efficient operations across large data sets.

Advantages: - Balanced structure ensures search, insert, and delete in O(log N). - Nodes remain at least half full (except root). - Efficient for disk-based or external memory operations.

Design Tradeoffs: - General-purpose and adaptable. - Balances flexibility and efficiency versus specialized indexing.

Key Takeaway: - B-trees dynamically rebalance through splitting, merging, and transferring,

ensuring consistent performance even for large, uneven datasets.