12- B-Trees¶
Overview
B-trees are self-balancing, multi-way search trees designed to minimize costly data retrieval operations from slow storage (e.g., disks or remote servers). By storing multiple keys per node, B-trees reduce the number of block reads needed to locate or insert data.
Key Idea
Accessing data within a block (node) is cheap.
Accessing new blocks (loading from disk/network) is expensive.
B-trees pack multiple keys into each node to maximize data retrieved per access.
B-Tree Structure¶
Definition
Each node contains multiple keys and child pointers.
Nodes maintain between k and 2k keys (except the root).
Internal nodes have between k + 1 and 2k + 1 children.
All leaves exist at the same depth (perfectly balanced).
Formal Structure
BTree {
BTreeNode: root
Integer: k
}
BTreeNode {
Integer: k
Integer: size
Boolean: is_leaf
Array of Type: keys
Array of BTreeNodes: children
}
Mapping
children[i] → keys < keys[i]
children[i+1] → keys > keys[i]
Analogy
Each node = a “binder” of index cards.
Each key = an index card (with data + pointer).
Pointers between binders define ordering of subtrees.
Searching in a B-Tree¶
Goal: Locate a key or return null if not found.
Algorithm
BTreeSearch(tree, target):
return BTreeNodeSearch(tree.root, target)
BTreeNodeSearch(node, target):
i = 0
WHILE i < node.size AND target >= node.keys[i]:
IF target == node.keys[i]:
return node.keys[i]
i = i + 1
IF node.is_leaf:
return null
return BTreeNodeSearch(node.children[i], target)
Runtime
Search cost: O(logₖ N) node accesses.
Each node search: O(k) (linear or binary search).
Adding Keys¶
Insertion Rules
Nodes may store between k and 2k keys.
If a node exceeds 2k keys, it must be split.
Splitting ensures all leaves remain at equal depth.
Helper Functions
BTreeNodeIsOverFull(node):
return node.size == (2 * node.k + 1)
Insert into Non-Full Node
BTreeNodeAddKey(node, key, next_child):
i = node.size - 1
WHILE i >= 0 AND key < node.keys[i]:
node.keys[i+1] = node.keys[i]
IF NOT node.is_leaf:
node.children[i+2] = node.children[i+1]
i = i - 1
node.keys[i+1] = key
IF NOT node.is_leaf:
node.children[i+2] = next_child
node.size += 1
Splitting Overfull Nodes
BTreeNodeSplit(node, child_index):
old_child = node.children[child_index]
new_child = BTreeNode(node.k)
split_index = Floor(old_child.size / 2)
split_key = old_child.keys[split_index]
-- Move upper half of keys/children to new_child --
BTreeNodeAddKey(node, split_key, new_child)
Recursive Insertion
BTreeNodeInsert(node, key):
i = 0
WHILE i < node.size AND key >= node.keys[i]:
IF key == node.keys[i]:
Update data
return
i = i + 1
IF node.is_leaf:
BTreeNodeAddKey(node, key, null)
ELSE:
BTreeNodeInsert(node.children[i], key)
IF BTreeNodeIsOverFull(node.children[i]):
BTreeNodeSplit(node, i)
Root Splitting
BTreeInsert(tree, key):
BTreeNodeInsert(tree.root, key)
IF BTreeNodeIsOverFull(tree.root):
new_root = BTreeNode(tree.k)
new_root.is_leaf = False
new_root.children[0] = tree.root
BTreeNodeSplit(new_root, 0)
tree.root = new_root
Complexity
Depth of tree: O(logₖ N)
Total cost: O(k × logₖ N)
B-tree always remains balanced.
Key Insights¶
B-trees are optimized for block access efficiency.
Each node access retrieves multiple keys at once.
Splitting maintains balance dynamically during inserts.
Perfect for indexing large, disk-based or distributed data sets.
Examples of Adding Keys¶
Basic Insertion: - If the target leaf is not full, insert the key in sorted order. - No splitting required.
Example: Adding key 30 to a non-full leaf.
Splitting a Full Leaf: - When a leaf is overfull after insertion:
Identify split point (middle key).
Promote middle key to parent.
Use
BTreeNodeSplitto divide the leaf into two siblings.
If the parent becomes overfull, split it recursively.
Splitting the Root: - When the root overfills:
Split root into two children.
Promote middle key to a new root.
The new root always starts with one key.
Tree height increases by one.
Complexity:
- Only nodes along the insertion path are modified.
- Total operations scale with O(log_k N).
Removing Keys¶
Overview: - Search for the target key. - Remove it, then repair under-full nodes to maintain balance. - Root may shrink, decreasing overall depth by one.
Fixing Under-Full Nodes¶
Under-Full Check:
BTreeNodeIsUnderFull(node):
return node.size < node.k
Two Repair Strategies: 1. Merge two small sibling nodes. 2. Transfer a key from a larger sibling.
—
1. Merge Operation (Listing 12-1):
- Merge left and right siblings when combined keys < 2k.
- Steps:
Copy separating key from parent to left child.
Append right child’s keys and children.
Remove separating key and right child from parent.
May cause parent to become under-full.
2. Transfer Operation:
- Used when siblings together have ≥ 2k keys.
Transfer Left (Listing 12-2): - Move one key from right sibling to left sibling. - Replace separating key in parent with sibling’s key.
Transfer Right (Listing 12-3): - Move one key from left sibling to right sibling. - Replace parent’s separating key with sibling’s last key.
Repair Wrapper (BTreeNodeRepairUnderFull):
- Chooses between merge or transfer based on key totals.
—
Finding the Minimum Key¶
Helper Function (Listing 12-4):
BTreeNodeFindMin(node):
if node.size == 0: return null
if node.is_leaf: return node.keys[0]
else: return BTreeNodeFindMin(node.children[0])
Used during deletion to find the smallest key in a subtree.
Deletion Algorithm¶
Top-Level Function:
BTreeDelete(tree, key):
BTreeNodeDelete(tree.root, key)
if root is empty and not leaf:
root = root.children[0]
Recursive Function (BTreeNodeDelete): 1. Search for key. 2. If node is a leaf, remove directly and shift keys. 3. If node is internal:
Replace key with smallest key from right subtree.
Recursively delete replacement key.
After deletion, check for under-full children and repair.
Complexity:
- One top-down pass, at most one sibling access per level.
- Total operations: O(log_k N).
Examples of Removing Keys¶
1. Removing from Leaf:
- If leaf remains ≥ k keys → no repair needed.
2. Removing from Internal Node: - Replace with next key in sorted order (from child).
3. Merging Nodes: - If a child becomes under-full → merge with sibling.
4. Key Transfer: - Borrow a key from a sibling to restore balance.
5. Removing a Level: - If root becomes empty → promote its single child to root.
—
Why This Matters¶
Key Concepts: - B-trees minimize slow memory accesses by grouping keys in nodes. - Each non-root node:
Holds between
kand2kkeys.Ensures branching factor ≥
k+1.
Maintains balance and efficient operations across large data sets.
Advantages:
- Balanced structure ensures search, insert, and delete in O(log N).
- Nodes remain at least half full (except root).
- Efficient for disk-based or external memory operations.
Design Tradeoffs: - General-purpose and adaptable. - Balances flexibility and efficiency versus specialized indexing.
Key Takeaway: - B-trees dynamically rebalance through splitting, merging, and transferring,
ensuring consistent performance even for large, uneven datasets.