Automatic Mode Selection
When running in hybrid mode (the default), scry-backfill automatically progresses through several phases to ensure complete and consistent data replication. The process checkpoints progress at each step, allowing it to resume from where it left off if interrupted.
1. Setup
Creates a replication slot and publication if they don't already exist. The replication slot ensures no WAL data is discarded before scry-backfill processes it.
2. Schema Extraction
Captures DDL statements, constraints, indexes, and other schema metadata. This ensures the shadow database has an identical structure to the source.
3. Snapshot
Bulk copies all tables using PostgreSQL's efficient COPY protocol. This provides a consistent point-in-time snapshot of all data.
4. CDC Streaming
Switches to logical replication for ongoing changes. All INSERT, UPDATE, and DELETE operations are captured and streamed in real-time.
Resumable Operations: scry-backfill checkpoints progress at configurable intervals. If the process is interrupted (network failure, restart, etc.), it automatically resumes from the last checkpoint rather than starting over.
Available Modes
While hybrid mode handles most use cases automatically, you can explicitly select a specific mode when needed.
Hybrid Mode (Default)
The recommended mode for most deployments. Hybrid mode performs a complete snapshot first, then seamlessly transitions to CDC streaming for ongoing changes.
[backfill]
mode = "hybrid"
Best for:
- Initial setup of shadow databases
- Complete data synchronization with ongoing updates
- Most production deployments
Snapshot Only
Performs a one-time bulk copy of all tables without setting up ongoing replication. The replication slot is not created or is dropped after completion.
[backfill]
mode = "snapshot"
Use cases:
- One-time data migration or export
- Creating a static copy for analysis
- Testing or development environment seeding
- Databases where ongoing replication isn't needed
CDC Only
Streams changes from an existing replication slot without performing an initial snapshot. Assumes the shadow database already has baseline data.
[backfill]
mode = "cdc"
Use cases:
- Resuming replication after snapshot was done separately
- Connecting to a pre-existing replication slot
- When baseline data was loaded through other means (pg_dump, etc.)
- Minimal-impact ongoing replication without initial load
Recommendation: Use hybrid mode unless you have a specific reason to choose otherwise. It handles the complexity of coordinating snapshots and CDC automatically, ensuring no data is lost during the transition.
Schema Change Handling
scry-backfill automatically handles schema changes during CDC streaming without requiring a restart. Changes are detected and propagated as they occur.
| Schema Change | Behavior |
|---|---|
ADD COLUMN |
New column appears in the CDC stream immediately. Existing rows show NULL or default value for the new column. |
DROP COLUMN |
Column is removed from the CDC stream. Subsequent events no longer include the dropped column. |
ALTER TYPE |
Type changes are propagated to the shadow database. Values are converted according to PostgreSQL's type coercion rules. |
CREATE TABLE |
Publication is automatically updated to include the new table (if it matches include patterns). |
DROP TABLE |
Table is removed from the publication. No further events are generated for the dropped table. |
No Restart Required: Schema changes are handled dynamically during CDC streaming. You don't need to stop and restart scry-backfill when your database schema evolves.
Ready to Configure scry-backfill?
Get early access and start creating shadow databases for migration testing.
Request Early Access