Calvin Protocol

Deterministic transaction ordering for distributed consistency

Overview

entropyDB uses the Calvin protocol to achieve deterministic transaction ordering across distributed nodes. This eliminates the need for distributed deadlock detection and enables high-throughput transaction processing.

Key Principles

Global Sequencer: Centralized component assigns order to all transactions
Deterministic Execution: All nodes execute transactions in the same order
No 2PC: Avoids expensive two-phase commit protocol
Predictable Performance: Consistent latency without blocking

Architecture

Components

1. Global Sequencer

Centralized service that assigns a unique, monotonically increasing sequence number to each transaction. Implemented with Raft consensus for high availability.

Sequencer assigns order:
T1 → seq=1000
T2 → seq=1001
T3 → seq=1002

2. Scheduler

Each node runs a local scheduler that executes transactions in sequence number order. Ensures deterministic execution across all replicas.

3. Lock Manager

Manages read/write locks locally. Since execution order is predetermined, no distributed deadlock is possible.

4. Storage Engine

LSM-tree based storage with MVCC (Multi-Version Concurrency Control) for efficient reads during write-heavy workloads.

Transaction Flow

1Client Submission

Client submits transaction with read/write set to any node

BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2);  -- Read set
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Write set
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

2Sequencing

Transaction is sent to global sequencer which assigns a sequence number and broadcasts to all nodes

3Lock Acquisition

Nodes acquire locks in sequence number order. No deadlock possible since order is predetermined

4Execution

Transaction executes deterministically on all nodes. Same input order guarantees same output

5Commit & Release

Writes are committed to storage, locks are released, and client receives confirmation

Benefits

No Distributed Deadlock

Predetermined execution order eliminates deadlock across nodes

Predictable Latency

No blocking or abort-retry cycles. Consistent performance

High Throughput

Batch sequencing and parallel execution within non-conflicting transactions

Serializability

Deterministic ordering provides serializable isolation automatically

Simplified Replication

Replicas execute same sequence, guaranteed identical state

No 2PC Overhead

Avoids expensive two-phase commit across shards

Optimizations

Batch Sequencing

The sequencer can assign numbers to batches of transactions simultaneously, improving throughput:

-- Sequencer processes batch every 10ms
Batch 1: T1-T100 → seq 1000-1099 (100 txns)
Batch 2: T101-T250 → seq 1100-1249 (150 txns)

Throughput: 15,000 txns/sec with 10ms batches

Read-Only Transactions

Read-only transactions bypass the sequencer and execute immediately on any replica:

-- Read-only: Execute immediately
SELECT * FROM users WHERE active = true;

-- Read-write: Goes through sequencer
UPDATE users SET last_login = NOW() WHERE id = 1;

Early Lock Release

Locks can be released as soon as the next non-conflicting transaction in sequence acquires them, improving concurrency.

Speculative Execution

Transactions can speculatively execute before receiving sequence number, then validate when number arrives. Reduces latency by ~50%.

Failure Handling

Sequencer Failure

Sequencer is replicated using Raft. If leader fails:

Raft elects new leader (typically < 100ms)
New leader continues from last sequence number
No transactions lost, brief pause in sequencing

Node Failure

If a storage node fails:

Replica takes over instantly (already in sync)
In-flight transactions continue on replica
Failed node recovers by replaying sequence log

Network Partition

In case of partition:

Partition with sequencer leader continues
Other partition blocks writes (CP in CAP)
Read-only queries continue on isolated nodes

Performance Characteristics

Typical Performance:
- Sequencing latency: 1-2ms (single region)
- Total transaction latency: 5-10ms
- Throughput: 10,000-100,000 txns/sec (depending on contention)
- Scales linearly with number of sequencer replicas

Compared to 2PC:
- 50% lower latency
- 10x higher throughput
- No distributed deadlock
- Predictable tail latencies

Best Practices

1. Declare Read/Write Sets Early

Help the sequencer by declaring which records you'll access. Enables optimizations.

2. Batch Related Updates

Group related updates into single transactions to reduce sequencing overhead.

3. Use Read Replicas

Route read-only queries to replicas to reduce load on primary nodes.

4. Monitor Sequencer Latency

Set up alerts if sequencing latency exceeds thresholds. May indicate bottleneck.

5. Keep Transactions Short

Long-running transactions hold locks longer, reducing parallelism.

Next Steps

→ ACID Guarantees → Multi-Model Transactions → Distributed Transactions