This error indicates an issue with PostgreSQL’s Write-Ahead Log (WAL) archiving when using replication slots. Here’s a comprehensive guide to understand and resolve this issue:
## Understanding the Problem
**Replication slots** keep track of how much WAL data a replica has consumed, preventing the master from deleting WAL files that replicas still need. When **WAL archiving** (`archive_mode = on`) is enabled alongside replication slots, conflicts can occur if:
1. The archive command fails
2. Disk space is insufficient
3. Replication slots are preventing WAL cleanup
4. Archive timeout settings are misconfigured
## Common Error Messages
– `ERROR: replication slot “slot_name” cannot be archived`
– `WARNING: archiving write-ahead log file “0000000100000001000000AB” failed too many times`
– `FATAL: could not archive write-ahead log file “0000000100000001000000AB”`
## Step-by-Step Solutions


### 1. **Check Current Status**
“`sql
— Check replication slots
SELECT * FROM pg_replication_slots;
— Check WAL archiving status
SELECT * FROM pg_stat_archiver;
— Check current WAL position
SELECT pg_current_wal_lsn();
“`
### 2. **Fix Archive Command Issues**
Check your `postgresql.conf`:
“`ini
archive_mode = on
archive_command = ‘cp %p /path/to/archive/%f’
# Consider adding error handling:
# archive_command = ‘test ! -f /path/to/archive/%f && cp %p /path/to/archive/%f’
“`
**Test the archive command manually:**
“`bash
# Find a WAL file
find $PGDATA/pg_wal -name “*.partial” -o -name “[0-9]*” | head -5
# Test the archive command
cp $PGDATA/pg_wal/0000000100000001000000AB /path/to/archive/
“`
### 3. **Manage Replication Slots**
**If a replica is down or lagging:**
“`sql
— Check slot activity
SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
AS lag_bytes FROM pg_replication_slots;
— Drop a problematic slot (CAUTION: replicas will need to re-sync)
SELECT pg_drop_replication_slot(‘slot_name’);
“`
**Alternative: Adjust slot retention**
“`sql
— For physical replication slots
ALTER SYSTEM SET max_slot_wal_keep_size = ’10GB’;
— For logical replication slots
ALTER SYSTEM SET wal_keep_size = ‘1024MB’;
“`
### 4. **Free Up WAL Space**
“`sql
— Check WAL directory usage
SELECT pg_ls_waldir();
— Force checkpoint to recycle WAL
CHECKPOINT;
— Check oldest required WAL
SELECT pg_get_wal_replay_pause_state();
“`
### 5. **Adjust Configuration**
In `postgresql.conf`:
“`ini
# Increase these if archive process is slow
archive_timeout = 300 # Force WAL switch every 5 minutes
wal_sender_timeout = 60s # For replication
# Monitor WAL growth
max_wal_size = 1GB
min_wal_size = 80MB
# For logical replication
wal_keep_size = 1024MB
“`
### 6. **Emergency Recovery**
If WAL directory is full and PostgreSQL won’t start:
“`bash
# Temporarily disable archiving
echo “archive_mode = off” >> $PGDATA/postgresql.auto.conf
# Start PostgreSQL
pg_ctl start
# Then clean up and reconfigure
“`
### 7. **Automated Monitoring Script**
Create a monitoring script (`check_wal_archive.sh`):
“`bash
#!/bin/bash
# Check archive status
FAILED_COUNT=$(psql -U postgres -t -c “SELECT failed_count FROM pg_stat_archiver”)
# Check disk space
ARCHIVE_DIR=”/path/to/archive”
WAL_DIR=”$PGDATA/pg_wal”
if [ $(df $ARCHIVE_DIR | awk ‘NR==2 {print $5}’ | sed ‘s/%//’) -gt 90 ]; then
echo “Archive directory nearly full”
fi
# Check for stuck slots
psql -U postgres -c “SELECT slot_name, active,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes
FROM pg_replication_slots WHERE active = false;”
“`
## Prevention Best Practices
1. **Monitor regularly:**
– Set up alerts for `failed_count` in `pg_stat_archiver`
– Monitor disk space in archive and WAL directories
– Track replication lag
2. **Maintain replication slots:**
“`sql
— Regular maintenance query
SELECT slot_name,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as lag
FROM pg_replication_slots;
“`
3. **Use archive_timeout carefully:**
– Don’t set too low (causes excessive archiving)
– Don’t set too high (risks data loss)
4. **Implement proper retention policy:**
“`bash
# Clean old archives (example)
find /path/to/archive -name “*.backup” -mtime +30 -delete
find /path/to/archive -type f -mtime +7 -delete
“`
5. **Consider using pgBackRest or Barman** for more robust WAL management.
## When to Seek Help
If the issue persists:
1. Check PostgreSQL logs: `tail -f $PGDATA/log/postgresql-*.log`
2. Verify filesystem permissions
3. Ensure network connectivity (if archiving to remote location)
4. Consider using `archive_mode = always` for more aggressive archiving
The key is balancing between replication slot retention and WAL archiving requirements. Regular monitoring and proper configuration adjustments will prevent most issues.


