Architecture - ScryData Proxy Documentation

High-Level Overview

ScryData sits between your application and database, transparently intercepting and forwarding queries while extracting metadata for observability:

                            Key Design Principles
                            Transparency: Drop-in replacement for direct database connection
Low Overhead: ~100μs target latency addition through async operations and lock-free data structures
Best-Effort Observability: Events published asynchronously, never block queries
Resilience: Circuit breaker, retries, and health checks protect your database

                        

Two Data Paths

ScryData uses two complementary data paths to capture both query patterns and actual data for migration validation:

In-Band Path: Query Capture

The in-band path captures live query traffic as it flows through your application:

Application → scry-proxy → PostgreSQL

scry-proxy sits between your application and database
Transparently forwards all queries with minimal latency (~100μs)
Extracts query metadata (SQL, timing, parameters) asynchronously
Publishes query events to scry-platform for analysis
Enables real-time query replay against shadow databases

Out-of-Band Path: Data Replication

The out-of-band path replicates your database schema and data independently:

PostgreSQL → scry-backfill → scry-platform

scry-backfill connects directly to PostgreSQL's replication stream
Uses logical replication (CDC) for continuous, low-impact data sync
Supports snapshot mode for initial data loading
Rate-limited to prevent overwhelming production databases
Keeps shadow database synchronized for accurate migration testing

Why Two Paths?

This dual-path architecture provides complete migration validation:

Query patterns from scry-proxy show how your application uses the database
Actual data from scry-backfill ensures shadow databases have realistic content
Independent operation means either path can run without the other
Zero application changes required for either path

Request Flow

Here's how a query flows through ScryData from client to database:

1. Client Sends Query

SQL query arrives via PostgreSQL wire protocol. ScryData accepts the connection transparently.

2. Circuit Breaker Check

Lock-free atomic check (~10-50ns). If circuit is open, request fails fast without hitting the database.

3. Connection Pool Acquisition

Get a healthy connection from the pool. May create new connection if needed, including health check and state reset.

4. Backend Execution

Query forwarded to PostgreSQL. Response streamed back to client.

5. Event Publishing (Async)

Query metadata sent to event batcher via lock-free channel. Never blocks the response.

6. Metrics Recording

Latency, success/failure, and other metrics recorded atomically (<300ns overhead).

Query Timeline Phases

Every query goes through these measured phases, exposed via /debug/timeline and Prometheus metrics:

Phase	Description	Typical Duration
Queue Time	Waiting before pool acquisition starts	<1ms
Pool Acquire	Getting a connection (may include health check + state reset)	100-500μs
Backend Execution	Actual query execution on database	Variable
Event Publishing	Async event dispatch (not counted in query latency)	<100ns

Core Components

1. Proxy Server

The main entry point that:

Listens for incoming TCP connections on the proxy port
Spawns a connection handler for each client
Manages graceful shutdown with connection draining
Tracks active connections

2. Circuit Breaker

Lock-free, three-state circuit breaker protecting the backend. Uses AtomicU8 for state and AtomicU32 for counters—no locks, predictable latency.

Learn more about the Circuit Breaker →

3. Connection Pool

Protocol-agnostic TCP connection pooling with deadpool integration. Includes health checks on every recycle and automatic state reset (DISCARD ALL).

Learn more about Connection Pooling →

4. Event Publisher

Trait-based abstraction for publishing query events. Supports debug logging and HTTP publishing with FlexBuffers serialization for 50% size reduction vs JSON.

Learn more about Observability →

5. Health Monitor

Predictive health monitoring using EMA baselines. Tracks error rate, latency (P99), and pool utilization. Warns when metrics deviate from baseline.

Learn more about Health Checks →

6. Metrics System

Central metrics singleton tracking all proxy operations. HDR histograms for accurate percentiles, atomic counters, and hot data tracking with Count-Min Sketch + Top-K heap.

Why Async?

Async architecture allows ScryData to handle thousands of concurrent connections with minimal memory:

Model	1,000 Connections	Memory Usage
Thread-per-connection	1,000 threads	8GB+ (8MB stack each)
Async (Tokio)	1,000 tasks	~8MB total

ScryData is built entirely on Tokio, the industry-standard async runtime for Rust that powers production systems at Discord, AWS, and Microsoft.

Why Lock-Free?

Locks can cause unpredictable latency spikes. Lock-free atomics ensure:

Consistent latency: No lock contention delays
Composability: Safe to call from any async context
Simplicity: No deadlock concerns

Critical path operations use lock-free atomics:

Circuit breaker state transitions: AtomicU8::compare_exchange
Metrics counters: AtomicU64::fetch_add
Event batching: tokio::mpsc::Sender::try_send (lock-free channel)

Protocol Handling

ScryData uses the PostgreSQL wire protocol for communication. Key message types extracted:

Message Type	Tag	Purpose
Query	'Q'	Simple query protocol
Parse	'P'	Extended query (prepared statements)
CommandComplete	'C'	Query completion with row count
ErrorResponse	'E'	Query errors with SQLSTATE

The Protocol trait abstraction allows ScryData to support multiple databases in the future (MySQL, MongoDB) via feature flags.

Performance Characteristics

Latency Budget

Target: ~100μs additional latency per query

10-50ns

Circuit breaker check

100-500μs

Pool acquisition

<100ns

Event batching

<300ns

Metrics recording

Total ScryData Overhead: ~500μs (0.5ms) typical

Memory Footprint

Base: ~10MB (Tokio runtime, binary code)
Connection pool: ~50KB per connection
Metrics: ~150KB (histograms, hot data tracker)
Event batcher: ~100KB per 1000 queued events

Total for 100 connections: ~20MB

Throughput

Tested: 10,000+ queries/sec on commodity hardware
Bottleneck: Usually backend database, not ScryData
Scaling: Linear scaling with CPU cores (Tokio work-stealing)

Ready to See It in Action?

Get early access to ScryData and start validating your database migrations with production traffic.

Request Early Access