Architecture

A transparent proxy built on a high-performance, async architecture designed to add ~100μs overhead while providing production-grade observability and resilience.

High-Level Overview

ScryData sits between your application and database, transparently intercepting and forwarding queries while extracting metadata for observability:

┌──────────┐         ┌────────────────────────────┐         ┌──────────┐
│          │         │       scry-proxy           │         │          │
│  Client  │────────▶│  - Circuit Breaker         │────────▶│ Postgres │
│  (App)   │◀────────│  - Connection Pool         │◀────────│ Database │
│          │         │  - Event Publisher         │         │          │
└──────────┘         └────────────────────────────┘         └──────────┘
                                  │                                │
                                  │ Query Events                   │ Logical Replication
                                  │ (async)                        │
                                  ▼                                ▼
                     ┌─────────────────────────────────────────────────────┐
                     │                  scry-platform                       │
                     │  - Query Analysis                                    │
                     │  - Shadow Database                                   │
                     │  - Migration Validation                              │
                     └─────────────────────────────────────────────────────┘
                                  ▲
                                  │ Schema + Data
                                  │ (rate-limited)
                     ┌─────────────────────────┐
                     │     scry-backfill       │
                     │  - CDC Streaming        │
                     │  - Snapshot Mode        │
                     └─────────────────────────┘
                                  ▲
                                  │ WAL / Snapshot
                     ┌─────────────────────────┐
                     │   Postgres Database     │
                     └─────────────────────────┘
                            

Key Design Principles

  • Transparency: Drop-in replacement for direct database connection
  • Low Overhead: ~100μs target latency addition through async operations and lock-free data structures
  • Best-Effort Observability: Events published asynchronously, never block queries
  • Resilience: Circuit breaker, retries, and health checks protect your database

Two Data Paths

ScryData uses two complementary data paths to capture both query patterns and actual data for migration validation:

In-Band Path: Query Capture

The in-band path captures live query traffic as it flows through your application:

Application → scry-proxy → PostgreSQL

  • scry-proxy sits between your application and database
  • Transparently forwards all queries with minimal latency (~100μs)
  • Extracts query metadata (SQL, timing, parameters) asynchronously
  • Publishes query events to scry-platform for analysis
  • Enables real-time query replay against shadow databases

Out-of-Band Path: Data Replication

The out-of-band path replicates your database schema and data independently:

PostgreSQL → scry-backfill → scry-platform

  • scry-backfill connects directly to PostgreSQL's replication stream
  • Uses logical replication (CDC) for continuous, low-impact data sync
  • Supports snapshot mode for initial data loading
  • Rate-limited to prevent overwhelming production databases
  • Keeps shadow database synchronized for accurate migration testing

Why Two Paths?

This dual-path architecture provides complete migration validation:

  • Query patterns from scry-proxy show how your application uses the database
  • Actual data from scry-backfill ensures shadow databases have realistic content
  • Independent operation means either path can run without the other
  • Zero application changes required for either path

Request Flow

Here's how a query flows through ScryData from client to database:

1. Client Sends Query

SQL query arrives via PostgreSQL wire protocol. ScryData accepts the connection transparently.

2. Circuit Breaker Check

Lock-free atomic check (~10-50ns). If circuit is open, request fails fast without hitting the database.

3. Connection Pool Acquisition

Get a healthy connection from the pool. May create new connection if needed, including health check and state reset.

4. Backend Execution

Query forwarded to PostgreSQL. Response streamed back to client.

5. Event Publishing (Async)

Query metadata sent to event batcher via lock-free channel. Never blocks the response.

6. Metrics Recording

Latency, success/failure, and other metrics recorded atomically (<300ns overhead).

Query Timeline Phases

Every query goes through these measured phases, exposed via /debug/timeline and Prometheus metrics:

Phase Description Typical Duration
Queue Time Waiting before pool acquisition starts <1ms
Pool Acquire Getting a connection (may include health check + state reset) 100-500μs
Backend Execution Actual query execution on database Variable
Event Publishing Async event dispatch (not counted in query latency) <100ns

Core Components

1. Proxy Server

The main entry point that:

  • Listens for incoming TCP connections on the proxy port
  • Spawns a connection handler for each client
  • Manages graceful shutdown with connection draining
  • Tracks active connections

2. Circuit Breaker

Lock-free, three-state circuit breaker protecting the backend. Uses AtomicU8 for state and AtomicU32 for counters—no locks, predictable latency.

Learn more about the Circuit Breaker →

3. Connection Pool

Protocol-agnostic TCP connection pooling with deadpool integration. Includes health checks on every recycle and automatic state reset (DISCARD ALL).

Learn more about Connection Pooling →

4. Event Publisher

Trait-based abstraction for publishing query events. Supports debug logging and HTTP publishing with FlexBuffers serialization for 50% size reduction vs JSON.

Learn more about Observability →

5. Health Monitor

Predictive health monitoring using EMA baselines. Tracks error rate, latency (P99), and pool utilization. Warns when metrics deviate from baseline.

Learn more about Health Checks →

6. Metrics System

Central metrics singleton tracking all proxy operations. HDR histograms for accurate percentiles, atomic counters, and hot data tracking with Count-Min Sketch + Top-K heap.

Why Async?

Async architecture allows ScryData to handle thousands of concurrent connections with minimal memory:

Model 1,000 Connections Memory Usage
Thread-per-connection 1,000 threads 8GB+ (8MB stack each)
Async (Tokio) 1,000 tasks ~8MB total

ScryData is built entirely on Tokio, the industry-standard async runtime for Rust that powers production systems at Discord, AWS, and Microsoft.

Why Lock-Free?

Locks can cause unpredictable latency spikes. Lock-free atomics ensure:

  • Consistent latency: No lock contention delays
  • Composability: Safe to call from any async context
  • Simplicity: No deadlock concerns

Critical path operations use lock-free atomics:

  • Circuit breaker state transitions: AtomicU8::compare_exchange
  • Metrics counters: AtomicU64::fetch_add
  • Event batching: tokio::mpsc::Sender::try_send (lock-free channel)

Protocol Handling

ScryData uses the PostgreSQL wire protocol for communication. Key message types extracted:

Message Type Tag Purpose
Query 'Q' Simple query protocol
Parse 'P' Extended query (prepared statements)
CommandComplete 'C' Query completion with row count
ErrorResponse 'E' Query errors with SQLSTATE

The Protocol trait abstraction allows ScryData to support multiple databases in the future (MySQL, MongoDB) via feature flags.

Performance Characteristics

Latency Budget

Target: ~100μs additional latency per query

10-50ns
Circuit breaker check
100-500μs
Pool acquisition
<100ns
Event batching
<300ns
Metrics recording

Total ScryData Overhead: ~500μs (0.5ms) typical

Memory Footprint

  • Base: ~10MB (Tokio runtime, binary code)
  • Connection pool: ~50KB per connection
  • Metrics: ~150KB (histograms, hot data tracker)
  • Event batcher: ~100KB per 1000 queued events

Total for 100 connections: ~20MB

Throughput

  • Tested: 10,000+ queries/sec on commodity hardware
  • Bottleneck: Usually backend database, not ScryData
  • Scaling: Linear scaling with CPU cores (Tokio work-stealing)

Ready to See It in Action?

Get early access to ScryData and start validating your database migrations with production traffic.

Request Early Access