13- Bloom Filters

Overview

  • Invented by Burton Bloom (1970).

  • A space-efficient probabilistic data structure to test set membership.

  • Answers: “Have we seen this key before?”

  • Key properties: - No false negatives: if it says “not seen,” it’s correct. - Possible false positives: may say “seen” when it hasn’t.

  • Common uses: - Prefiltering before slower lookups. - Checking for weak passwords or known records.

Core Concept

  • A Bloom filter is an array of bits (0 or 1).

  • Each inserted key is mapped to k indices using k hash functions.

  • Each hash index is set to 1 during insertion.

Operations

  • Insert(key): - Compute k hash values. - Set bits at those indices to 1.

  • Lookup(key): - Compute k hash values. - If any bit is 0 → key not present. - If all bits are 1 → key possibly present.

BloomFilter Structure

::
BloomFilter {

Integer: size Integer: k Array of bits: bins Array of hash functions: h

}

Example Pseudocode

::
BloomFilterInsertKey(filter, key):
FOR i in 0..k-1:

index = filter.h[i](key) filter.bins[index] = 1

BloomFilterLookup(filter, key):
FOR i in 0..k-1:

index = filter.h[i](key) IF filter.bins[index] == 0:

RETURN False

RETURN True

Performance

  • Time complexity: O(k)

  • Memory: O(m) bits, where m is number of bins.

  • Independent of total elements inserted.

  • Lookup can terminate early after first 0.

Tuning Parameters

False positive rate approximation:

\[P_{false} = (1 - (1 - 1/m)^{nk})^k\]
  • n: number of inserted items

  • m: number of bits

  • k: number of hash functions

Effects: - Larger m → lower false positives. - More items (n) → higher false positives. - Optimal k depends on balance between accuracy and space.

Example False Positive Rates (n = 100)

m

k = 1

k = 3

k = 5

200 400 600 800 1000

0.3942 0.2214 0.1536 0.1176 0.0952

0.4704 0.1473 0.0610 0.0306 0.0174

0.6535 0.1855 0.0579 0.0217 0.0094

Bloom Filters vs. Hash Tables

  • Hash Tables: - Exact results, store keys explicitly. - Higher memory usage (keys + pointers).

  • Bloom Filters: - No keys or pointers stored. - Compact, bit-based memory footprint. - Faster, cache-friendly lookups.

Applications

  • Password blacklist checks.

  • Caching prefilters.

  • Large-scale record existence tests.

  • Distributed databases and search engines.

Key Insights

  • Trades accuracy for speed and memory.

  • Enables fast “likely/not present” checks.

  • Demonstrates probabilistic data structures with controlled error.

  • Foundation for more complex structures like counting Bloom filters.

Summary

Bloom filters provide a lightweight, probabilistic way to test membership: - Tiny memory footprint. - Constant-time operations. - No false negatives, some false positives. - Useful as a prefilter for expensive lookups or large data sets.