Calvin Protocol
Deterministic transaction ordering for distributed consistency
Overview
entropyDB uses the Calvin protocol to achieve deterministic transaction ordering across distributed nodes. This eliminates the need for distributed deadlock detection and enables high-throughput transaction processing.
Key Principles
- Global Sequencer: Centralized component assigns order to all transactions
- Deterministic Execution: All nodes execute transactions in the same order
- No 2PC: Avoids expensive two-phase commit protocol
- Predictable Performance: Consistent latency without blocking
Architecture
Components
1. Global Sequencer
Centralized service that assigns a unique, monotonically increasing sequence number to each transaction. Implemented with Raft consensus for high availability.
Sequencer assigns order: T1 → seq=1000 T2 → seq=1001 T3 → seq=1002
2. Scheduler
Each node runs a local scheduler that executes transactions in sequence number order. Ensures deterministic execution across all replicas.
3. Lock Manager
Manages read/write locks locally. Since execution order is predetermined, no distributed deadlock is possible.
4. Storage Engine
LSM-tree based storage with MVCC (Multi-Version Concurrency Control) for efficient reads during write-heavy workloads.
Transaction Flow
1Client Submission
Client submits transaction with read/write set to any node
BEGIN; SELECT * FROM accounts WHERE id IN (1, 2); -- Read set UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Write set UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT;
2Sequencing
Transaction is sent to global sequencer which assigns a sequence number and broadcasts to all nodes
3Lock Acquisition
Nodes acquire locks in sequence number order. No deadlock possible since order is predetermined
4Execution
Transaction executes deterministically on all nodes. Same input order guarantees same output
5Commit & Release
Writes are committed to storage, locks are released, and client receives confirmation
Benefits
No Distributed Deadlock
Predetermined execution order eliminates deadlock across nodes
Predictable Latency
No blocking or abort-retry cycles. Consistent performance
High Throughput
Batch sequencing and parallel execution within non-conflicting transactions
Serializability
Deterministic ordering provides serializable isolation automatically
Simplified Replication
Replicas execute same sequence, guaranteed identical state
No 2PC Overhead
Avoids expensive two-phase commit across shards
Optimizations
Batch Sequencing
The sequencer can assign numbers to batches of transactions simultaneously, improving throughput:
-- Sequencer processes batch every 10ms Batch 1: T1-T100 → seq 1000-1099 (100 txns) Batch 2: T101-T250 → seq 1100-1249 (150 txns) Throughput: 15,000 txns/sec with 10ms batches
Read-Only Transactions
Read-only transactions bypass the sequencer and execute immediately on any replica:
-- Read-only: Execute immediately SELECT * FROM users WHERE active = true; -- Read-write: Goes through sequencer UPDATE users SET last_login = NOW() WHERE id = 1;
Early Lock Release
Locks can be released as soon as the next non-conflicting transaction in sequence acquires them, improving concurrency.
Speculative Execution
Transactions can speculatively execute before receiving sequence number, then validate when number arrives. Reduces latency by ~50%.
Failure Handling
Sequencer Failure
Sequencer is replicated using Raft. If leader fails:
- Raft elects new leader (typically < 100ms)
- New leader continues from last sequence number
- No transactions lost, brief pause in sequencing
Node Failure
If a storage node fails:
- Replica takes over instantly (already in sync)
- In-flight transactions continue on replica
- Failed node recovers by replaying sequence log
Network Partition
In case of partition:
- Partition with sequencer leader continues
- Other partition blocks writes (CP in CAP)
- Read-only queries continue on isolated nodes
Performance Characteristics
Typical Performance: - Sequencing latency: 1-2ms (single region) - Total transaction latency: 5-10ms - Throughput: 10,000-100,000 txns/sec (depending on contention) - Scales linearly with number of sequencer replicas Compared to 2PC: - 50% lower latency - 10x higher throughput - No distributed deadlock - Predictable tail latencies
Best Practices
1. Declare Read/Write Sets Early
Help the sequencer by declaring which records you'll access. Enables optimizations.
2. Batch Related Updates
Group related updates into single transactions to reduce sequencing overhead.
3. Use Read Replicas
Route read-only queries to replicas to reduce load on primary nodes.
4. Monitor Sequencer Latency
Set up alerts if sequencing latency exceeds thresholds. May indicate bottleneck.
5. Keep Transactions Short
Long-running transactions hold locks longer, reducing parallelism.