Rate Limiting & WAL Monitoring

Control replication throughput and monitor write-ahead log growth to protect your production database.

Why Rate Limiting?

During the initial snapshot phase, scry-backfill can read data from PostgreSQL very quickly. While this speeds up the backfill process, uncontrolled read speeds can cause problems:

  • Network bandwidth saturation - High-speed reads can overwhelm network links, affecting other services
  • Increased I/O load - Aggressive reads compete with production queries for disk I/O
  • CPU pressure - Serializing large result sets consumes CPU cycles
  • Memory pressure - Large batch reads can increase PostgreSQL's memory usage

Rate limiting ensures scry-backfill is a "good citizen" that coexists with your production workload without degrading performance.

Configuring Rate Limits

Configure rate limiting in your scry-backfill.toml configuration file:

[rate_limiter]
# Maximum rows to read per second during snapshot
rows_per_second = 20000

# Maximum bytes to read per second during snapshot
bytes_per_second = "20MB"

# Batch size for snapshot reads
batch_size = 10000

# Enable adaptive rate limiting based on source DB load
adaptive = true

Tuning Guidelines

Use these starting points based on your database size and sensitivity:

Database Size Rows/Second Bytes/Second
Small (<10 GB) 50,000 50 MB/s
Medium (10-100 GB) 20,000 20 MB/s
Large (100+ GB) 10,000 10 MB/s
Production-sensitive 5,000 5 MB/s

Tip: Start with conservative limits and gradually increase them while monitoring your database's performance. It's easier to speed up a slow backfill than to recover from an overloaded production database.

WAL Monitoring

Replication slots in PostgreSQL prevent write-ahead log (WAL) segments from being recycled until they've been consumed. If scry-backfill stops or falls behind, WAL files accumulate on the source database.

Critical Warning: Uncontrolled WAL growth can fill your disk and crash PostgreSQL. Always monitor WAL lag when using logical replication, and have a plan for handling stuck replication slots.

Automatic WAL Monitoring

scry-backfill includes built-in WAL monitoring. Configure thresholds in your configuration file:

[wal_monitor]
# Enable automatic WAL lag monitoring
enabled = true

# Warning threshold - log warnings when WAL lag exceeds this
warning_threshold = "1GB"

# Critical threshold - pause backfill when WAL lag exceeds this
critical_threshold = "5GB"

# Check interval
check_interval = "30s"

Manual WAL Check

You can manually check WAL lag for your replication slot using this SQL query:

SELECT
    slot_name,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_lag,
    active
FROM pg_replication_slots
WHERE slot_name = 'scry_backfill_slot';

If WAL Grows Too Large

If you notice WAL accumulating beyond acceptable levels, follow these steps:

  1. Check if scry-backfill is running - Verify the process is active and processing data
  2. Check network connectivity - Ensure scry-backfill can reach both source and destination databases
  3. Review logs for errors - Look for connection failures or data type conversion errors
  4. Restart if stuck - Sometimes restarting scry-backfill resolves transient issues
  5. Drop the slot if permanent - If scry-backfill will not recover, drop the replication slot to release WAL:
    -- Only do this if you're abandoning this replication
    SELECT pg_drop_replication_slot('scry_backfill_slot');

Monitoring Metrics

scry-backfill exposes Prometheus metrics to help you monitor replication health and rate limiting behavior:

Metric Description
scry_backfill_rows_processed_total Total number of rows processed (snapshot + CDC)
scry_backfill_bytes_processed_total Total bytes read from source database
scry_backfill_rate_limited_waits_total Number of times processing paused due to rate limits
scry_backfill_wal_lag_bytes Current WAL lag in bytes for the replication slot

Tip: Set up alerts on scry_backfill_wal_lag_bytes to catch WAL growth issues before they become critical. A good starting threshold is 500MB for warning and 2GB for critical alerts.

Ready to Start Replicating?

Get early access to scry-backfill and start creating shadow databases for migration testing.

Request Early Access