Modern Storage Engines

Modern Storage Engines — My Notes and Mental Models

All throughout my undergrad, and during my one year of backend engineering at Draup, I only knew about classic databases like PostgreSQL, MongoDB, and Elasticsearch. I knew that MongoDB and Elasticsearch were distributed systems—one good for full-text search and one good for high-write workloads—but I never really dug deeper.

Now in my master’s at Northeastern, I’m finally digging into modern storage engines and asking more fundamental questions, like why NoSQL systems are better for writes, and why log-structured systems behave differently from in-place updates. Once you go down this path, databases suddenly stop feeling like “black boxes” and start feeling like engineering tradeoff machines.

To make sense of everything, I split database systems into three layers:

1. Frontend → how the query is written
2. Compute
3. Storage

Frontend Layer — Query Languages → IR → Execution Plan

When I say “frontend,” I mean how the query is expressed. PostgreSQL uses SQL, MongoDB has its own query syntax, Elasticsearch has its own JSON DSL. But no matter how you write the query, all of these systems internally transform it into some Intermediate Representation (IR).

That IR is what flows into the compute layer. So SQL vs Mongo vs Elasticsearch is often just syntactic differences; internally, they boil down to similar structures like logical plans, relational algebra trees, or operator graphs.

This is where I realized: the frontend is mostly an interface. What matters is the IR the compute layer sees.

Compute Layer — Scheduling, Optimization, Parallelism

Different databases make very different choices here. Some calculate cost-based query plans. Some rely heavily on heuristics. Some aggressively parallelize operators; others stick to a single-node push/pull execution model.

A smoother transition: the compute layer exists to decide how to execute a query—but the storage layer decides what you pay for each decision. That’s why storage-engine choices affect everything upstream.

Storage Layer — B-trees vs Log-Structured (LSM) and Write Behavior

This is where databases really diverge.

Traditional RDBMS systems like PostgreSQL use B-trees with in-place updates, which means updating a row usually triggers a random write somewhere within the tree structure.

Random writes are the kryptonite of hard disks and not great for SSDs either.

Log-structured storage engines (like the WiredTiger engine used by MongoDB today) flip that model. Everything goes into an append-only log first. Appends = sequential writes. Sequential writes are amazingly fast on almost every storage medium.

To make this concrete, here’s a simple example:

Example 1: Updating a value in B-tree vs LSM

B-tree (PostgreSQL)
If you update a row with primary key = 42:

The engine finds the page where that row lives
Modifies the page
Writes that page back (random write)

LSM (MongoDB’s WiredTiger)
If you update the same row:

It just appends a new entry {key=42, value=new} to a log
Periodically merges logs in the background (sequential writes)

This is why LSM/log-structured engines crush B-trees for write-heavy workloads.

Example 2: Compute/Storage Separation (Snowflake, Databricks)

Traditional databases bundle compute + storage together.
If you need more storage, you scale the whole system.
If you need more compute, same problem.

Modern cloud warehouses separate things like this:

S3 / Object Storage  <-- durable data (cheap, slow)
       ↑
       | (columnar files / parquet / delta)
       ↓
Compute Clusters     <-- CPU, RAM, caching (fast, on-demand)

So you can scale compute up/down on demand, while storage sits cheaply on S3.

This is why Snowflake, BigQuery, and Databricks are eating the analytics world.

Why This Separation Actually Matters

With compute and storage decoupled:

Storage becomes infinitely scalable (S3, GCS, Azure Blob).
Compute scales independently (more nodes, or temporary bursts).
Multiple frontends can sit on same data (SQL, APIs, AI workloads).
Caching layers (result caching, local SSD caching) make querying fast.

This is the opposite of classic databases where everything is tightly coupled and scaling is painful.

My Opinion (based on everything I’ve learned so far)

I genuinely think the next decade of databases will look way more like Snowflake/Databricks than PostgreSQL/MySQL for large workloads.

PostgreSQL is amazing for OLTP and will always have a place.
But the industry is moving toward:

log-structured engines
columnar formats
decoupled compute/storage
cloud-native execution
vectorized query processing
AI-friendly interfaces

The “database” is becoming more like a platform with layers you can independently tune.

Source : Andrew Pavlo CMU courses on YouTube

Modern Storage Engines

Frontend Layer — Query Languages → IR → Execution Plan

Compute Layer — Scheduling, Optimization, Parallelism

Storage Layer — B-trees vs Log-Structured (LSM) and Write Behavior

Example 1: Updating a value in B-tree vs LSM

Example 2: Compute/Storage Separation (Snowflake, Databricks)

Why This Separation Actually Matters

My Opinion (based on everything I’ve learned so far)

Comments

More from this blog

Zigzagged Deletes: MongoDB TTL Regression Caused by a WiredTiger Cursor Bug

Building Orion: A C++23 Distributed Task Runtime Inspired by Ray

Rethinking Metadata Indexing in Analytical Data Systems

Hacking Secret Hitler: From Bayesian Math to AI Agents

Command Palette

Frontend Layer — Query Languages → IR → Execution Plan

Compute Layer — Scheduling, Optimization, Parallelism

Storage Layer — B-trees vs Log-Structured (LSM) and Write Behavior

Example 1: Updating a value in B-tree vs LSM

Example 2: Compute/Storage Separation (Snowflake, Databricks)

Why This Separation Actually Matters

My Opinion (based on everything I’ve learned so far)

Comments

More from this blog