Alex Jia - Software Engineer & Data Engineer

Ionia is a distributed protocol designed to achieve high throughput with low latency by decoupling scalability from locality. This article explores the foundational storage concepts, the problems Ionia solves, and its innovative approach to distributed write-optimized key-value stores.

Part 1: Storage Engine Foundations

Understanding the trade-offs that motivate Ionia requires examining the fundamental storage engine architectures.

1. B-Tree (Traditional)

Structure:

        [50|100]
       /    |    \
[20|30]  [60|80]  [110|120]
 /  |  \   /  |  \   /  |  \
leaf leaf leaf leaf ...

Write Operation Flow:

Search: Traverse root → leaf (requires random disk reads).
Update: Insert key into leaf.
Split: If node is full, split and propagate up (requires multiple random disk writes).

Characteristics:

Symmetric Performance: Reads and writes are roughly equally fast/slow.
Bottleneck: Limited by Random IOPS (e.g., ~600K IOPS on SSD).

2. LSM Tree (Log-Structured Merge Tree)

Core Insight: Convert random writes into sequential writes to maximize throughput.

Structure:

Memory Tier: MemTable (Sorted, ~256 MB).
Disk Tier (SSTables):
- Level 0: ~256 MB
- Level 1: ~2.5 GB
- Level 2: ~25 GB ...

Write Operation Flow:

Buffer: Write to MemTable (RAM) + Append to Write-Ahead Log (Sequential).
Flush: When MemTable fills, flush to disk as SSTable (Sequential write, 3-7 GB/s).
Merge: Background process merges sorted files to clean up invalid data.

Characteristics:

Asymmetric Performance: Excellent write throughput (sequential) but reads can be slower (checking multiple levels).
Write-Optimized Key-Value (WO-KV): Prioritizes write speed over read speed.

Part 2: The Problem Space

Background & Motivation

WO-KV Speed: Single-node WO-KV stores have massive write throughput.
Replication Bottleneck: Traditional replication protocols kill this performance because:
- They apply writes sequentially on a single thread to ensure identical replicas.
- They require coordination for write ordering (high latency).
- Followers are unused (wasted resources).
The Goal: An ideal protocol must preserve WO-KV write performance (throughput/latency) while ensuring consistency.

Shortcomings of Existing Systems

Trade-offs: Systems usually force a choice between scalable reads (low latency) vs. availability.
Batching Latency: Systems like CBASE/Eve use multi-threading but rely on large batches to find concurrency, increasing latency.
Sequential Write Bottlenecks: Systems using commutativity or network ordering often still suffer from sequential write limitations.
Read Latency: Systems routing reads to the leader (Gaios, Gnothi) suffer from high RTT.

Part 3: Ionia Protocol Overview

Ionia is a distributed protocol designed to achieve high throughput with low latency by decoupling scalability from locality.

Core Philosophy

Parallel Execution: Concurrently execute non-conflicting writes to avoid inconsistencies.
Deferred Ordering: Guarantees durability immediately (1 RTT) but defers strict ordering and execution to the background.
Decoupling Locality:
- Traditional In-Memory bottleneck: Network.
- WO-KV bottleneck: SSD Random IOPS.
- Insight: Reads can scale non-locally as long as validation checks are done in-memory without hitting the SSD.
Client-Side Consistency: To solve stale reads at followers, the client performs the final validity check using metadata from the leader.

Part 4: Ionia Implementation

A. Write Operations (Fast Durability)

Ionia separates durability from execution to achieve speed.

1. Fast Durability (1 RTT):

Client sends write to all replicas in parallel.
Replicas append to Durability Log (uncoordinated) and ACK.
Client waits for a Supermajority (f + ⌈f/2⌉ + 1) including the leader.
Result: Durable in 1 RTT.

2. Background Ordering:

Leader moves writes from Durability Log to Consensus Log (assigns sequence numbers).
Leader batches these into PREPARE messages for followers.
Once f followers PREPARE-OK, the order is finalized (COMMIT).

3. Parallel Execution:

ExecQueues: Storage layer hashes keys to specific thread queues.
Rule: Non-conflicting writes execute in parallel threads. Conflicting writes (same key) execute serially.
Progress: Replicas track applied_index (latest Consensus Log index applied to KV store).

B. Read Operations (Scalable & Consistent)

1. Leader Reads

Check: Leader checks Durability Log for pending updates.
Empty? Read from KV Store (1 RTT).
Pending? Synchronously order/execute pending updates, then return (2 RTTs, but rare).

2. Follower Reads (The "Meta-Query" Mechanism)

To allow reading from followers without staleness, Ionia uses a parallel check.

Action: Client sends Read to Follower AND Meta-Query to Leader simultaneously.
Follower Response: Returns Data + Follower_Applied_Index.
Leader Response: Returns Key_Modified_Index (from in-memory history).

Client-Side Consistency Check:

Logic: if Follower_Applied_Index >= Key_Modified_Index: Data is Fresh.
Else: Data is stale; retry at leader.

3. History Management & Optimization

The leader cannot store history for every key forever.

Trimming: Leader tracks applied_index of all "active" followers. History is trimmed up to the point where all active followers have caught up.
Missing Keys (LTI): If a key is trimmed from history, Leader returns LTI (Last-Trimmed Index).
Optimization: Pending updates in the leader's log trigger immediate synchronous execution at the leader to return fresh data, avoiding a client retry.

Example Scenario:

Key k1 last modified at index 50.
Follower has applied up to index 100 (has k1's latest version).
Leader History trimmed to index 80 (LTI). k1 is no longer in history.
The Check:
- Leader returns: LTI = 80.
- Follower returns: Data + Applied_Index = 100.
- Client Check: 100 ≥ 80 → PASS.

Part 5: Reliability & Correctness

Failures and View Changes

Recovery: Replicas restore Consensus and Durability logs from the leader.
Why Supermajority?
- Standard majority isn't enough for the Durability Log because writes are uncoordinated (Log A: [a,b], Log B: [b,a]).
- Supermajority Quorum: f + ⌈f/2⌉ + 1.
- Ensures that after f failures, at least one remaining replica has the correct order of writes.
View Change Process: New leader collects logs, builds a dependency DAG from pairwise comparisons, topologically sorts it to finalize order, and enters the new view.

Correctness Proof Sketch

Property 1: Write Ordering (Linearizability)

Normal operation: Leader imposes order moving from Durability Log to Consensus Log.
View Change: Supermajority guarantees the new leader can reconstruct the linearizable order despite failures.
Execution: Deterministic hashing ensures conflicting writes execute in the same order on all replicas.

Property 2: Read Freshness

Case 1 (Pending Write): Leader Meta-Query sees pending write in Durability Log → Leader executes and returns fresh data.
Case 2 (Executed Write): Client compares Follower_Applied_Index vs Leader_Modified_Index (or LTI). Because LTI is always ≥ actual modified index (conservative), the client never accepts stale data.

Conclusion

Ionia represents a significant advancement in distributed WO-KV systems by achieving high throughput and low latency through innovative techniques: separating durability from execution, enabling parallel non-conflicting writes, and allowing scalable follower reads with client-side consistency checks. The protocol's design elegantly addresses the fundamental tension between write performance and consistency in distributed systems.

Ionia: High-Performance Distributed Write-Optimized Key-Value Stores