Time-Series Data

High-performance ingestion, aggregation, and retention for temporal data

Overview

entropyDB provides native support for time-series data with automatic partitioning, compression, and retention policies. Ideal for metrics, logs, IoT sensor data, and analytics.

📈 High Throughput

Millions of writes per second with automatic batching

🗜️ Compression

Automatic time-series compression reduces storage by 90%+

⏰ Retention Policies

Automatic data expiration and archival

📊 Continuous Aggregates

Pre-computed rollups for fast queries

Creating Time-Series Tables

-- Create time-series table
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  metric_name TEXT NOT NULL,
  value DOUBLE PRECISION,
  tags JSONB
);

-- Convert to time-series (hypertable)
SELECT create_hypertable('metrics', 'time');

-- Add index on commonly queried columns
CREATE INDEX idx_metrics_name_time ON metrics (metric_name, time DESC);
CREATE INDEX idx_metrics_tags ON metrics USING GIN (tags);

💡 Hypertables

Hypertables automatically partition data by time, providing optimal performance for time-series workloads.

Data Ingestion

Single Insert

INSERT INTO metrics (time, metric_name, value, tags)
VALUES (
  NOW(),
  'cpu.usage',
  75.5,
  '{"host": "server1", "region": "us-west"}'::jsonb
);

Batch Insert

INSERT INTO metrics (time, metric_name, value, tags) VALUES
  (NOW(), 'cpu.usage', 75.5, '{"host": "server1"}'::jsonb),
  (NOW(), 'memory.usage', 8192, '{"host": "server1"}'::jsonb),
  (NOW(), 'disk.io', 1500, '{"host": "server1"}'::jsonb),
  (NOW(), 'cpu.usage', 82.3, '{"host": "server2"}'::jsonb);

Using COPY for Bulk Load

-- From CSV file
COPY metrics (time, metric_name, value, tags)
FROM '/path/to/metrics.csv'
WITH (FORMAT csv, HEADER true);

-- Can achieve 1M+ rows/second

Querying Time-Series Data

Recent Data

-- Last 1 hour of data
SELECT time, metric_name, value
FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND metric_name = 'cpu.usage'
ORDER BY time DESC;

Aggregations

-- 5-minute averages
SELECT 
  time_bucket('5 minutes', time) AS bucket,
  metric_name,
  AVG(value) as avg_value,
  MAX(value) as max_value,
  MIN(value) as min_value
FROM metrics
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY bucket, metric_name
ORDER BY bucket DESC;

Downsampling

-- Hourly rollup with percentiles
SELECT 
  time_bucket('1 hour', time) AS hour,
  tags->>'host' as host,
  AVG(value) as avg_cpu,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY value) as p95_cpu,
  percentile_cont(0.99) WITHIN GROUP (ORDER BY value) as p99_cpu
FROM metrics
WHERE metric_name = 'cpu.usage'
  AND time > NOW() - INTERVAL '7 days'
GROUP BY hour, host
ORDER BY hour DESC;

Gap Filling

-- Fill missing time intervals with interpolation
SELECT 
  time_bucket_gapfill('1 minute', time) AS minute,
  locf(AVG(value)) as interpolated_value
FROM metrics
WHERE metric_name = 'temperature'
  AND time > NOW() - INTERVAL '1 hour'
GROUP BY minute
ORDER BY minute;

Continuous Aggregates

Pre-compute aggregations for faster queries:

-- Create continuous aggregate
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT 
  time_bucket('1 hour', time) AS hour,
  metric_name,
  tags->>'host' as host,
  AVG(value) as avg_value,
  MAX(value) as max_value,
  MIN(value) as min_value,
  COUNT(*) as sample_count
FROM metrics
GROUP BY hour, metric_name, host;

-- Add refresh policy (auto-update)
SELECT add_continuous_aggregate_policy('metrics_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);

-- Query the aggregate (much faster!)
SELECT * FROM metrics_hourly
WHERE hour > NOW() - INTERVAL '30 days';

Retention Policies

Automatically drop old data:

-- Drop data older than 90 days
SELECT add_retention_policy('metrics', INTERVAL '90 days');

-- Different retention for different data
-- Keep raw data for 30 days
SELECT add_retention_policy('metrics', INTERVAL '30 days');

-- Keep hourly aggregates for 1 year
SELECT add_retention_policy('metrics_hourly', INTERVAL '1 year');

-- Keep daily aggregates forever (no retention)

-- View retention policies
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';

Compression

Enable automatic compression for older data:

-- Enable compression
ALTER TABLE metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'metric_name, tags',
  timescaledb.compress_orderby = 'time DESC'
);

-- Add compression policy
SELECT add_compression_policy('metrics', INTERVAL '7 days');

-- Manually compress chunks
SELECT compress_chunk(chunk)
FROM show_chunks('metrics', older_than => INTERVAL '7 days');

-- View compression stats
SELECT 
  pg_size_pretty(before_compression_total_bytes) as uncompressed,
  pg_size_pretty(after_compression_total_bytes) as compressed,
  round((1 - after_compression_total_bytes::numeric / 
         before_compression_total_bytes::numeric) * 100, 2) as savings_pct
FROM hypertable_compression_stats('metrics');

✓ Compression Benefits

Typical compression ratios: 90-95% space savings for time-series data with minimal performance impact.

Real-Time Analytics

-- Real-time dashboard query
WITH recent_metrics AS (
  SELECT 
    time_bucket('1 minute', time) AS minute,
    tags->>'host' as host,
    metric_name,
    AVG(value) as value
  FROM metrics
  WHERE time > NOW() - INTERVAL '15 minutes'
  GROUP BY minute, host, metric_name
)
SELECT 
  host,
  MAX(CASE WHEN metric_name = 'cpu.usage' THEN value END) as cpu,
  MAX(CASE WHEN metric_name = 'memory.usage' THEN value END) as memory,
  MAX(CASE WHEN metric_name = 'disk.io' THEN value END) as disk_io
FROM recent_metrics
WHERE minute = (SELECT MAX(minute) FROM recent_metrics)
GROUP BY host;

Best Practices

1. Choose the Right Time Column

Use TIMESTAMPTZ for timezone-aware timestamps. Ensure time values are indexed.

2. Batch Your Writes

Insert multiple rows at once for better performance. Use COPY for bulk loads.

3. Use Continuous Aggregates

Pre-compute common aggregations to speed up queries on historical data.

4. Enable Compression

Compress data older than your active query window to save storage costs.

5. Set Retention Policies

Automatically drop old data you no longer need to maintain performance.

Next Steps

→ SQL Reference → Performance Tuning → Monitoring & Alerts