Configuration

Configure scry-backfill via environment variables or TOML config file for flexible deployment.

Configuration Priority

Configuration is loaded in priority order (highest to lowest):

Environment Variables (SCRY_BACKFILL_*)
              |
              v
Configuration File (scry-backfill.toml)
              |
              v
       Default Values
                            

This allows you to set defaults in scry-backfill.toml and override specific values with environment variables in production. Nested configuration uses double underscores (__):

# source.host -> SCRY_BACKFILL_SOURCE__HOST
export SCRY_BACKFILL_SOURCE__HOST="localhost"

# producer.shadow_id -> SCRY_BACKFILL_PRODUCER__SHADOW_ID
export SCRY_BACKFILL_PRODUCER__SHADOW_ID="shadow-prod-001"

Complete Configuration Example

A full configuration file showing all available options:

[backfill]
mode = "hybrid"
include_tables = ["public.*"]
exclude_tables = ["public.audit_logs", "public.sessions"]
checkpoint_interval_secs = 60
checkpoint_path = "/var/lib/scry-backfill/checkpoint.json"

[source]
host = "localhost"
port = 5432
database = "production"
user = "replication_user"
password = "${SCRY_BACKFILL_SOURCE__PASSWORD}"
replication_slot = "scry_backfill_slot"
publication_name = "scry_backfill_pub"
ssl_mode = "prefer"

[producer]
shadow_id = "shadow-prod-001"
endpoint = "https://api.scrydata.io/v1/ingest"
auth_token = "${SCRY_BACKFILL_PRODUCER__AUTH_TOKEN}"

[rate_limiter]
enabled = true
max_rows_per_second = 10000
max_bytes_per_second = 10485760
burst_multiplier = 2.0

[wal_monitor]
enabled = true
check_interval_secs = 30
max_wal_bytes_warning = 1073741824
max_wal_bytes_critical = 5368709120

Backfill Settings

Control the overall backfill behavior and table selection.

Parameter Default Description
mode hybrid Operation mode: "hybrid", "snapshot", or "cdc". Hybrid automatically selects based on database state.
include_tables ["*.*"] Glob patterns for tables to include. Supports schema.table format.
exclude_tables [] Glob patterns for tables to exclude. Takes precedence over include_tables.
checkpoint_interval_secs 60 Seconds between checkpoint saves. Lower values provide better crash recovery at the cost of I/O.
checkpoint_path ./checkpoint.json Path to store checkpoint file for resumable backfill operations.

Source Database Settings

Configure the connection to your PostgreSQL source database.

Parameter Default Description
host localhost PostgreSQL server hostname or IP address.
port 5432 PostgreSQL server port.
database Database name to replicate from.
user Username for database connection. Must have replication privileges.
password Password for database connection. Use environment variable for security.
replication_slot scry_backfill Name of the logical replication slot to use or create.
publication_name scry_backfill Name of the PostgreSQL publication for logical replication.
ssl_mode prefer SSL mode: "disable", "allow", "prefer", "require", "verify-ca", "verify-full".

Producer Settings

Configure how data is sent to scry-platform.

Parameter Default Description
shadow_id REQUIRED. Unique identifier for the shadow database in scry-platform.
endpoint https://api.scrydata.io/v1/ingest scry-platform ingestion endpoint URL.
auth_token Authentication token for scry-platform API. Use environment variable for security.

Security Warning: Never store sensitive credentials like password or auth_token directly in configuration files. Always use environment variables (SCRY_BACKFILL_SOURCE__PASSWORD, SCRY_BACKFILL_PRODUCER__AUTH_TOKEN) or reference them using ${ENV_VAR} syntax in your TOML file.

Rate Limiter Settings

Control the rate of data extraction to avoid impacting production database performance.

Parameter Default Description
enabled true Enable rate limiting. Recommended for production databases.
max_rows_per_second 10000 Maximum number of rows to process per second during snapshot mode.
max_bytes_per_second 10485760 Maximum bytes per second (default: 10 MB/s). Limits network and disk I/O impact.
burst_multiplier 2.0 Allows temporary bursts up to this multiple of the rate limit for catching up.

Tip: Start with conservative rate limits and increase gradually while monitoring your database's CPU and I/O metrics. See Rate Limiting & WAL for detailed tuning guidance.

WAL Monitor Settings

Monitor PostgreSQL Write-Ahead Log (WAL) to prevent disk exhaustion from replication lag.

Parameter Default Description
enabled true Enable WAL size monitoring. Highly recommended for production.
check_interval_secs 30 Seconds between WAL size checks.
max_wal_bytes_warning 1073741824 WAL size threshold (default: 1 GB) that triggers a warning log and metric.
max_wal_bytes_critical 5368709120 WAL size threshold (default: 5 GB) that pauses replication to prevent disk exhaustion.

Note: When the critical threshold is reached, scry-backfill will pause and wait for WAL to be consumed before resuming. This prevents the replication slot from filling your disk. Monitor the scry_backfill_wal_bytes metric in production.

Ready to Configure scry-backfill?

Get early access and start creating shadow databases for migration testing.

Request Early Access