Table of Contents
- File Sync for Replicas
- Overview
- How It Works
- Configuration
- When File Sync is Disabled
- Gotchas & Edge Cases
- File Path Format
- Record Deletion
- Bulk Operations
- Sync Timing
- Network Interruptions
- Large Files
- Storage Consistency
- Monitoring
- Troubleshooting
- Best Practices
- 1. Configure Adequate Retention
- 2. Monitor Disk Space
- 3. Use Consistent Storage Paths
- 4. Test File Sync
- 5. Plan for Large Files
- API Endpoints
- Architecture Decisions
File Sync for Replicas
When using local storage (not S3), replicas need a way to sync uploaded files from the master server. VSKI provides automatic file synchronization as part of the replication system.
Overview
┌─────────────────────────────────────────────────────────────┐
│ Master │
│ ┌─────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Database │ │ _file_journal │ │ Storage │ │
│ │ default.db │◄───│ id | op | path │◄───│ /files │ │
│ └──────┬──────┘ └────────┬────────┘ └──────┬──────┘ │
└─────────┼────────────────────┼────────────────────┼─────────┘
│ │ │
│ DB Sync │ Journal Sync │ File Download
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Replica │
│ ┌─────────────┐ ┌───────────┐ │
│ │ Database │ │ Storage │ │
│ │ default.db │ │ /files │ │
│ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────┘
Key Components
| Component | Description |
|---|---|
_file_journal |
Table tracking file operations (add/delete) |
| File Journal Sync | Replica fetches journal entries after DB sync |
| File Download | Replica downloads new files from master |
| File Deletion | Replica removes local files marked as deleted |
| Acknowledgment | Replica confirms sync, master cleans up journal |
How It Works
1. Recording File Operations
When a file is uploaded or deleted on the master:
-- On file upload
INSERT INTO _file_journal (operation, path, created)
VALUES ('add', 'collection_id/record_id/document.pdf', datetime('now'));
-- On file deletion
INSERT INTO _file_journal (operation, path, created)
VALUES ('delete', 'collection_id/record_id/document.pdf', datetime('now'));
The journal is stored in default.db and replicates automatically with the
database.
2. Sync Process
Every sync cycle, the replica:
- Syncs database - Gets the latest
_file_journalentries - Fetches journal - Requests entries since last sync
- Downloads files - For each
addoperation, downloads file from master - Deletes files - For each
deleteoperation, removes local file - Acknowledges - Tells master which entries were processed
3. Journal Cleanup
The master uses a dual cleanup strategy:
| Strategy | Trigger | Purpose |
|---|---|---|
| Ack-based | Replica confirms sync | Immediate cleanup after sync |
| Time-based | Entries older than N days | Safety net for failed syncs |
Configuration
Master Server
File journal is created automatically when:
- Not in replica mode
- Not using S3 storage
# Master configuration
JWT_SECRET=your-shared-secret
STORAGE_PATH=./data/files # Local storage path
FILE_SYNC_RETENTION_DAYS=7 # Time-based cleanup (default: 7)
Replica Server
File sync is enabled automatically when:
- Running in replica mode
- Not using S3 storage
# Replica configuration
REPLICA_MODE=replica
MASTER_URL=http://master:3001
JWT_SECRET=your-shared-secret # Must match master
STORAGE_PATH=./data/files # Local storage path
SYNC_INTERVAL=60 # Sync every 60 seconds
Environment Variables
| Variable | Default | Description |
|---|---|---|
FILE_SYNC_RETENTION_DAYS |
7 |
Days to retain journal entries |
STORAGE_PATH |
./data/files |
Path for file storage |
SYNC_INTERVAL |
60 |
Sync interval in seconds |
When File Sync is Disabled
File sync is automatically skipped when using S3 storage:
# With S3, files are stored in the bucket
# No file journal is created
# Replicas access the same S3 bucket
S3_ENDPOINT=https://s3.amazonaws.com
S3_BUCKET=my-bucket
S3_ACCESS_KEY=xxx
S3_SECRET_KEY=xxx
Gotchas & Edge Cases
File Path Format
Files are stored with the path structure:
{storage_path}/{collection_id}/{record_id}/{filename}
Examples:
files/abc123/def456/document.pdffiles/users/user789/avatar.png
Record Deletion
When a record is deleted, all associated files are recorded in the journal:
// Deleting a record triggers file cleanup
await client.collection("posts").delete("record_id");
// The journal records all file deletions
// Replica will delete all files for this record
Bulk Operations
Bulk deletes also trigger file journal entries:
// Bulk delete - files for all records are journaled
await client.collection("posts").bulkDelete(["id1", "id2", "id3"]);
Sync Timing
File sync happens after database sync in each cycle:
- Database sync completes
- File journal entries fetched
- Files downloaded/deleted
- Acknowledgment sent
If file sync fails, the journal entries remain for the next cycle.
Network Interruptions
If file download fails:
- Journal entry is NOT acknowledged
- File will be retried on next sync
- Other files continue syncing
- Error is logged but doesn't stop sync
Large Files
Large files are downloaded with:
- 5-minute HTTP timeout
- SHA256 checksum verification
- Atomic file writes (temp file → final location)
Storage Consistency
The replica's storage matches the master's structure:
Master Storage Replica Storage
./data/files/ ./data/files/
├── col1/ ├── col1/
│ └── rec1/ │ └── rec1/
│ └── file.pdf │ └── file.pdf
└── col2/ └── col2/
└── rec2/ └── rec2/
└── image.png └── image.png
Monitoring
Check File Journal Status
# On master, query the journal
sqlite3 data/db/default.db "SELECT COUNT(*) FROM _file_journal"
# View recent entries
sqlite3 data/db/default.db \
"SELECT * FROM _file_journal ORDER BY id DESC LIMIT 10"
Sync Logs
Watch for file sync activity:
# Replica logs
INFO starting replication sync
INFO replication sync completed
INFO syncing files count=5
DEBUG downloaded file path=col1/rec1/doc.pdf
DEBUG deleted local file path=col2/rec2/old.png
Troubleshooting
Files Not Syncing to Replica
- Check S3 configuration - File sync is skipped with S3
- Verify STORAGE_PATH - Ensure replica has write access
- Check sync interval - Files sync after DB sync
- Review journal - Check for entries in
_file_journal
File Journal Growing Large
- Check replica connectivity - Ack-based cleanup depends on replicas
- Verify retention setting -
FILE_SYNC_RETENTION_DAYS - Manual cleanup - Run
CleanupOlderThanmanually
-- Manual cleanup (entries older than 7 days)
DELETE FROM _file_journal
WHERE created < datetime('now', '-7 days');
Checksum Mismatch
If file checksum fails:
- File is NOT saved to replica
- Journal entry remains unacknowledged
- Retry happens on next sync
- Check master file integrity
Replica Has Extra Files
Files deleted on master will be deleted on replica during sync. If replica has extra files:
- They were created before replication was set up
- They were created directly on replica (shouldn't happen - read-only)
- Manual cleanup may be needed
Best Practices
1. Configure Adequate Retention
# For frequent sync (every minute)
FILE_SYNC_RETENTION_DAYS=1 # 1 day is sufficient
# For infrequent sync (hourly or more)
FILE_SYNC_RETENTION_DAYS=7 # More buffer for issues
2. Monitor Disk Space
Both master and replica need space for:
- Database files
- Uploaded files
- Temporary download files
3. Use Consistent Storage Paths
# Same path structure on both servers
# Master
STORAGE_PATH=/var/lib/vski/files
# Replica
STORAGE_PATH=/var/lib/vski/files
4. Test File Sync
# Upload a file on master
curl -X POST http://master:3001/api/collections/docs/records \
-H "Authorization: Bearer $TOKEN" \
-F "file=@test.pdf"
# Wait for sync interval
sleep 60
# Verify on replica
curl http://replica:3002/files/docs/record_id/test.pdf
5. Plan for Large Files
For systems with large files:
- Increase
SYNC_INTERVALto reduce bandwidth - Monitor sync duration
- Consider S3 for better scalability
API Endpoints
The file sync uses these internal replica endpoints:
| Endpoint | Description |
|---|---|
GET /api/replica/files?since= |
Get journal entries since ID |
GET /api/replica/file/*path |
Download file from master |
POST /api/replica/files/ack |
Acknowledge synced entries |
These are internal endpoints used by the sync process and require replica authentication.
Architecture Decisions
Why Store Journal in Database?
- Atomic with DB sync - Journal replicates with the database
- No separate state - Single source of truth
- Automatic consistency - DB transaction guarantees
- Simple recovery - Restore DB, get journal state
Why Dual Cleanup Strategy?
- Ack-based - Immediate cleanup when replicas confirm
- Time-based - Safety net for:
- Offline replicas
- Failed acknowledgments
- New replicas (don't need old entries)
Why Not Real-time File Sync?
- Batching - More efficient for multiple files
- Consistency - Files sync after DB is consistent
- Simplicity - No separate file sync connection
- Reliability - Retry on next cycle if failed