Core Concepts
Understanding entropyDB's architecture and design principles
Multi-Model Architecture
entropyDB unifies six data models in a single system: relational (SQL), document, key-value, time-series, graph, and vector. Unlike polyglot persistence solutions that combine multiple specialized databases, entropyDB provides native support for all models with shared storage and transactions.
ποΈ Relational & SQL
PostgreSQL-compatible interface with ACID transactions and complex joins
π Document Store
JSON/JSONB support with flexible schema and rich querying
π Key-Value
High-performance get/put operations with sub-millisecond latency
π Time-Series
Optimized for temporal data with automatic retention and aggregation
πΈοΈ Property Graphs
Native graph storage with efficient traversals and pattern matching
π§ Vector Search
Embeddings storage with ANN search for AI applications
Unified Storage Engine
entropyDB uses a log-structured merge-tree (LSM) based storage engine optimized for both read and write workloads. Data is organized into tablets (horizontal shards) that can be distributed across nodes.
Storage Hierarchy: βββ Cluster β βββ Node 1 β β βββ Tablet A (shard 1) β β βββ Tablet B (shard 2) β βββ Node 2 β β βββ Tablet C (shard 3) β β βββ Tablet D (shard 4) β βββ Node 3 (replicas)
Key Features
- Automatic compaction and space reclamation
- Bloom filters for efficient key lookups
- Write-ahead logging for durability
- Block cache for hot data
Distributed Transactions
entropyDB implements the Calvin protocol for deterministic transaction ordering, providing serializable isolation across distributed nodes without traditional 2PC overhead.
Global Sequencer
A dedicated sequencer assigns global timestamps to transactions, ensuring deterministic ordering across all nodes. This eliminates distributed deadlocks and reduces coordination overhead.
ACID Guarantees
- Atomicity: All operations in a transaction succeed or fail together
- Consistency: Transactions maintain database invariants
- Isolation: Serializable isolation by default
- Durability: Committed transactions survive failures
Cross-Model Transactions
BEGIN;
-- Update relational table
UPDATE accounts SET balance = balance - 100
WHERE id = 1;
-- Insert into time-series
INSERT INTO transactions (ts, amount, type)
VALUES (NOW(), 100, 'withdrawal');
-- Update document
UPDATE user_profiles
SET metadata = jsonb_set(metadata, '{last_withdrawal}', 'NOW()')
WHERE user_id = 1;
COMMIT;Replication & High Availability
entropyDB uses Raft consensus for leader election and log replication. Each tablet is replicated across multiple nodes (typically 3 or 5) with automatic failover.
Replication Modes
Synchronous Replication
Writes are acknowledged only after replication to a quorum. Provides strongest consistency with slightly higher latency.
Asynchronous Replication
Writes are acknowledged immediately. Lower latency but potential data loss on failures.
-- Configure replication factor ALTER TABLE users SET replication_factor = 3; -- Set consistency level per query SELECT * FROM users WITH (consistency_level = 'strong');
Horizontal Sharding
Data is automatically partitioned across tablets using consistent hashing. entropyDB supports both hash-based and range-based sharding strategies.
-- Hash sharding (default) CREATE TABLE users ( id SERIAL PRIMARY KEY, name TEXT, email TEXT ) DISTRIBUTED BY HASH(id); -- Range sharding CREATE TABLE events ( timestamp TIMESTAMP, user_id INT, event_type TEXT ) DISTRIBUTED BY RANGE(timestamp); -- Co-location for joins CREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INT, amount DECIMAL ) COLOCATED WITH users;
π‘ Tip
Co-locate tables that are frequently joined to avoid distributed queries and improve performance.
Query Processing
entropyDB uses a cost-based query optimizer with support for parallel execution. Queries are compiled to an efficient execution plan that can span multiple nodes.
Query Pushdown
Filters and aggregations are pushed down to storage nodes, minimizing data transfer.
Parallel Execution
Queries automatically parallelize across CPU cores and nodes for maximum throughput.
Index Selection
Optimizer automatically chooses the most efficient indexes for your queries.
Join Optimization
Hash joins, merge joins, and nested loop joins selected based on data distribution.