Health Checks - ScryData Proxy Documentation

Three Layers of Health Monitoring

Layer 1: Active Health Checks
   ↓
Periodic background checks (every 30s default)
Tests database connectivity and responsiveness

Layer 2: Passive Health Checks
   ↓
On-demand during connection pool recycling
Validates connection before reuse

Layer 3: Health Monitoring System
   ↓
Tracks metrics baselines using EMA
Detects anomalies (error rate, latency, pool saturation)
Integrates with circuit breaker for predictive opening

Active Health Checks

Periodic background task that actively tests database health every 30 seconds (configurable).

Health Check Query

SELECT 1

Fast (~1ms), low overhead, no side effects, works with any database state.

Failure Handling

Consecutive failures are tracked:

Check 1: ✓ Success (failures: 0)
Check 2: ✓ Success (failures: 0)
Check 3: ✗ Timeout (failures: 1)
Check 4: ✗ Timeout (failures: 2)
Check 5: ✗ Timeout (failures: 3) → Mark Unhealthy

When unhealthy: Circuit breaker opens, requests fail fast, health checks continue until recovery.

Passive Health Checks

Every connection is validated before being returned from the pool:

Connection returned to pool
         ↓
    Health check (SELECT 1)
         ↓
   ┌─────┴─────┐
   │  Healthy  │ → State reset (DISCARD ALL) → Return to pool
   └───────────┘
         │
   ┌─────┴─────┐
   │  Failed   │ → Connection discarded → Pool creates new
   └───────────┘

Benefits:

Connection quality guaranteed
State consistency ensured
No stale connections
Automatic cleanup

Health Monitoring System

Advanced monitoring that tracks baseline metrics using Exponential Moving Average (EMA) and detects anomalies.

Exponential Moving Average (EMA)

EMA gives more weight to recent values while maintaining history:

EMA_new = (alpha × current_value) + ((1 - alpha) × EMA_old)

Alpha (default 0.1): Lower = smoother, slower to adapt. Higher = more responsive, noisier.

Health Status Levels

Status	Description	Circuit Breaker
Healthy	No warnings	Closed
Degraded	Minor warnings present	Closed
Unhealthy	Critical warnings	Opens

Warning Types

Error Rate Spike: Current error rate > 3x baseline
Latency Spike: Current P99 latency > 2x baseline
Pool Saturation: Pool utilization > 95%
Pool Starvation (Critical): No available connections + waiting requests

Configuration

Active Health Checks

[resilience.healthcheck]
active_enabled = true
interval_secs = 30
timeout_ms = 1000
failure_threshold = 3

Health Monitoring

[health]
error_rate_spike_factor = 3.0
latency_spike_factor = 2.0
pool_saturation_threshold = 0.95
ema_alpha = 0.1

Health Endpoint

curl http://localhost:9090/health

Healthy Response

{
  "status": "Healthy",
  "uptime_secs": 3600,
  "queries_total": 10000,
  "error_rate": 0.001,
  "latency_p99_ms": 5.2,
  "pool_utilization": 0.45,
  "warnings": []
}

Degraded Response

{
  "status": "Degraded",
  "uptime_secs": 3605,
  "queries_total": 10100,
  "error_rate": 0.025,
  "pool_utilization": 0.96,
  "warnings": [
    {
      "type": "ErrorRateSpike",
      "message": "Error rate (2.5%) is 3.0x baseline (0.8%)"
    },
    {
      "type": "PoolSaturation",
      "message": "Pool utilization at 96%"
    }
  ]
}

Catch Issues Before They Impact Production

ScryData's predictive health monitoring detects degradation before failures cascade.

Request Early Access