Core Concepts

Understanding entropyDB's architecture and design principles

Multi-Model Architecture

entropyDB unifies six data models in a single system: relational (SQL), document, key-value, time-series, graph, and vector. Unlike polyglot persistence solutions that combine multiple specialized databases, entropyDB provides native support for all models with shared storage and transactions.

🗄️ Relational & SQL

PostgreSQL-compatible interface with ACID transactions and complex joins

📄 Document Store

JSON/JSONB support with flexible schema and rich querying

🔑 Key-Value

High-performance get/put operations with sub-millisecond latency

📈 Time-Series

Optimized for temporal data with automatic retention and aggregation

🕸️ Property Graphs

Native graph storage with efficient traversals and pattern matching

🧠 Vector Search

Embeddings storage with ANN search for AI applications

Unified Storage Engine

entropyDB uses a log-structured merge-tree (LSM) based storage engine optimized for both read and write workloads. Data is organized into tablets (horizontal shards) that can be distributed across nodes.

Storage Hierarchy:
├── Cluster
│   ├── Node 1
│   │   ├── Tablet A (shard 1)
│   │   └── Tablet B (shard 2)
│   ├── Node 2
│   │   ├── Tablet C (shard 3)
│   │   └── Tablet D (shard 4)
│   └── Node 3 (replicas)

Key Features

Automatic compaction and space reclamation
Bloom filters for efficient key lookups
Write-ahead logging for durability
Block cache for hot data

Distributed Transactions

entropyDB implements the Calvin protocol for deterministic transaction ordering, providing serializable isolation across distributed nodes without traditional 2PC overhead.

Global Sequencer

A dedicated sequencer assigns global timestamps to transactions, ensuring deterministic ordering across all nodes. This eliminates distributed deadlocks and reduces coordination overhead.

ACID Guarantees

Atomicity: All operations in a transaction succeed or fail together
Consistency: Transactions maintain database invariants
Isolation: Serializable isolation by default
Durability: Committed transactions survive failures

Cross-Model Transactions

BEGIN;
  -- Update relational table
  UPDATE accounts SET balance = balance - 100 
  WHERE id = 1;
  
  -- Insert into time-series
  INSERT INTO transactions (ts, amount, type)
  VALUES (NOW(), 100, 'withdrawal');
  
  -- Update document
  UPDATE user_profiles 
  SET metadata = jsonb_set(metadata, '{last_withdrawal}', 'NOW()')
  WHERE user_id = 1;
COMMIT;

Replication & High Availability

entropyDB uses Raft consensus for leader election and log replication. Each tablet is replicated across multiple nodes (typically 3 or 5) with automatic failover.

Replication Modes

Synchronous Replication

Writes are acknowledged only after replication to a quorum. Provides strongest consistency with slightly higher latency.

Asynchronous Replication

Writes are acknowledged immediately. Lower latency but potential data loss on failures.

-- Configure replication factor
ALTER TABLE users SET replication_factor = 3;

-- Set consistency level per query
SELECT * FROM users 
WITH (consistency_level = 'strong');

Horizontal Sharding

Data is automatically partitioned across tablets using consistent hashing. entropyDB supports both hash-based and range-based sharding strategies.

-- Hash sharding (default)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name TEXT,
  email TEXT
) DISTRIBUTED BY HASH(id);

-- Range sharding
CREATE TABLE events (
  timestamp TIMESTAMP,
  user_id INT,
  event_type TEXT
) DISTRIBUTED BY RANGE(timestamp);

-- Co-location for joins
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT,
  amount DECIMAL
) COLOCATED WITH users;

💡 Tip

Co-locate tables that are frequently joined to avoid distributed queries and improve performance.

Query Processing

entropyDB uses a cost-based query optimizer with support for parallel execution. Queries are compiled to an efficient execution plan that can span multiple nodes.

Query Pushdown

Filters and aggregations are pushed down to storage nodes, minimizing data transfer.

Parallel Execution

Queries automatically parallelize across CPU cores and nodes for maximum throughput.

Index Selection

Optimizer automatically chooses the most efficient indexes for your queries.

Join Optimization

Hash joins, merge joins, and nested loop joins selected based on data distribution.

Next Steps

→ Explore Data Models → Transaction Deep Dive → Performance Tuning Guide