6-Tries and Adapting Data Structures¶
Overview¶
Binary search trees (BSTs) can store strings, but string comparisons are costly. Tries (“try”) adapt tree structures to exploit the sequential and limited nature of strings, providing more efficient searches.
Binary Search Trees of Strings¶
BSTs can store any sortable data, including strings.
Strings are ordered alphabetically for insertion and search.
Search cost in BSTs depends on: - Number of strings (tree height). - Length of strings (comparison cost).
Cost of String Comparisons¶
String comparisons proceed character by character until a difference is found.
Longer strings or similar prefixes increase comparison cost.
BST searches for strings combine both tree height and string length, compounding cost.
Tries¶
A trie is a multi-way tree based on string prefixes.
Each level branches on the next character.
Properties: - Nodes may have up to k children (e.g., 26 for English letters). - Internal nodes represent prefixes (may or may not be valid entries). - Leaf nodes represent complete strings. - Each node includes a marker (e.g.,
is_entry) to indicate valid words.
Trie Operations¶
Search: Follow branches for each character; cost proportional to string length.
Insert: Add new nodes for missing characters while descending.
Delete: Remove nodes upward if they are no longer needed.
Advantages and Trade-offs¶
Lookup time depends only on string length, not number of entries.
Prefix sharing reduces cost when strings overlap.
Memory cost is higher due to many potential child pointers.
Best suited for: - Word processors (spell checking). - Dictionaries with many shared prefixes. - Serial numbers or structured codes.
Why This Matters¶
Tries demonstrate how exploiting data structure can improve efficiency. By aligning branching with string structure, we reduce expensive comparisons and achieve scalable performance for large string sets.