When using local storage (not S3), replicas need a way to sync uploaded files from the master server. VSKI provides automatic file synchronization as part of the replication system.

Overview

┌─────────────────────────────────────────────────────────────┐
│                         Master                               │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────┐  │
│  │  Database   │    │  _file_journal  │    │   Storage   │  │
│  │  default.db │◄───│  id | op | path │◄───│   /files    │  │
│  └──────┬──────┘    └────────┬────────┘    └──────┬──────┘  │
└─────────┼────────────────────┼────────────────────┼─────────┘
          │                    │                    │
          │ DB Sync            │ Journal Sync       │ File Download
          │                    │                    │
          ▼                    ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                         Replica                              │
│  ┌─────────────┐                              ┌───────────┐  │
│  │  Database   │                              │  Storage  │  │
│  │  default.db │                              │  /files   │  │
│  └─────────────┘                              └───────────┘  │
└─────────────────────────────────────────────────────────────┘

Key Components

Component	Description
`_file_journal`	Table tracking file operations (add/delete)
File Journal Sync	Replica fetches journal entries after DB sync
File Download	Replica downloads new files from master
File Deletion	Replica removes local files marked as deleted
Acknowledgment	Replica confirms sync, master cleans up journal

How It Works

1. Recording File Operations

When a file is uploaded or deleted on the master:

-- On file upload
INSERT INTO _file_journal (operation, path, created) 
VALUES ('add', 'collection_id/record_id/document.pdf', datetime('now'));

-- On file deletion  
INSERT INTO _file_journal (operation, path, created)
VALUES ('delete', 'collection_id/record_id/document.pdf', datetime('now'));

The journal is stored in default.db and replicates automatically with the database.

2. Sync Process

Every sync cycle, the replica:

Syncs database - Gets the latest _file_journal entries
Fetches journal - Requests entries since last sync
Downloads files - For each add operation, downloads file from master
Deletes files - For each delete operation, removes local file
Acknowledges - Tells master which entries were processed

3. Journal Cleanup

The master uses a dual cleanup strategy:

Strategy	Trigger	Purpose
Ack-based	Replica confirms sync	Immediate cleanup after sync
Time-based	Entries older than N days	Safety net for failed syncs

Configuration

Master Server

File journal is created automatically when:

Not in replica mode
Not using S3 storage

# Master configuration
JWT_SECRET=your-shared-secret
STORAGE_PATH=./data/files          # Local storage path
FILE_SYNC_RETENTION_DAYS=7         # Time-based cleanup (default: 7)

Replica Server

File sync is enabled automatically when:

Running in replica mode
Not using S3 storage

# Replica configuration
REPLICA_MODE=replica
MASTER_URL=http://master:3001
JWT_SECRET=your-shared-secret      # Must match master
STORAGE_PATH=./data/files          # Local storage path
SYNC_INTERVAL=60                   # Sync every 60 seconds

Environment Variables

Variable	Default	Description
`FILE_SYNC_RETENTION_DAYS`	`7`	Days to retain journal entries
`STORAGE_PATH`	`./data/files`	Path for file storage
`SYNC_INTERVAL`	`60`	Sync interval in seconds

When File Sync is Disabled

File sync is automatically skipped when using S3 storage:

# With S3, files are stored in the bucket
# No file journal is created
# Replicas access the same S3 bucket
S3_ENDPOINT=https://s3.amazonaws.com
S3_BUCKET=my-bucket
S3_ACCESS_KEY=xxx
S3_SECRET_KEY=xxx

Gotchas & Edge Cases

File Path Format

Files are stored with the path structure:

{storage_path}/{collection_id}/{record_id}/{filename}

Examples:

files/abc123/def456/document.pdf
files/users/user789/avatar.png

Record Deletion

When a record is deleted, all associated files are recorded in the journal:

// Deleting a record triggers file cleanup
await client.collection("posts").delete("record_id");

// The journal records all file deletions
// Replica will delete all files for this record

Bulk Operations

Bulk deletes also trigger file journal entries:

// Bulk delete - files for all records are journaled
await client.collection("posts").bulkDelete(["id1", "id2", "id3"]);

Sync Timing

File sync happens after database sync in each cycle:

Database sync completes
File journal entries fetched
Files downloaded/deleted
Acknowledgment sent

If file sync fails, the journal entries remain for the next cycle.

Network Interruptions

If file download fails:

Journal entry is NOT acknowledged
File will be retried on next sync
Other files continue syncing
Error is logged but doesn't stop sync

Large Files

Large files are downloaded with:

5-minute HTTP timeout
SHA256 checksum verification
Atomic file writes (temp file → final location)

Storage Consistency

The replica's storage matches the master's structure:

Master Storage                 Replica Storage
./data/files/                  ./data/files/
├── col1/                      ├── col1/
│   └── rec1/                  │   └── rec1/
│       └── file.pdf           │       └── file.pdf
└── col2/                      └── col2/
    └── rec2/                      └── rec2/
        └── image.png                  └── image.png

Monitoring

Check File Journal Status

# On master, query the journal
sqlite3 data/db/default.db "SELECT COUNT(*) FROM _file_journal"

# View recent entries
sqlite3 data/db/default.db \
  "SELECT * FROM _file_journal ORDER BY id DESC LIMIT 10"

Sync Logs

Watch for file sync activity:

# Replica logs
INFO starting replication sync
INFO replication sync completed
INFO syncing files count=5
DEBUG downloaded file path=col1/rec1/doc.pdf
DEBUG deleted local file path=col2/rec2/old.png

Troubleshooting

Files Not Syncing to Replica

Check S3 configuration - File sync is skipped with S3
Verify STORAGE_PATH - Ensure replica has write access
Check sync interval - Files sync after DB sync
Review journal - Check for entries in _file_journal

File Journal Growing Large

Check replica connectivity - Ack-based cleanup depends on replicas
Verify retention setting - FILE_SYNC_RETENTION_DAYS
Manual cleanup - Run CleanupOlderThan manually

-- Manual cleanup (entries older than 7 days)
DELETE FROM _file_journal 
WHERE created < datetime('now', '-7 days');

Checksum Mismatch

If file checksum fails:

File is NOT saved to replica
Journal entry remains unacknowledged
Retry happens on next sync
Check master file integrity

Replica Has Extra Files

Files deleted on master will be deleted on replica during sync. If replica has extra files:

They were created before replication was set up
They were created directly on replica (shouldn't happen - read-only)
Manual cleanup may be needed

Best Practices

1. Configure Adequate Retention

# For frequent sync (every minute)
FILE_SYNC_RETENTION_DAYS=1   # 1 day is sufficient

# For infrequent sync (hourly or more)
FILE_SYNC_RETENTION_DAYS=7   # More buffer for issues

2. Monitor Disk Space

Both master and replica need space for:

Database files
Uploaded files
Temporary download files

3. Use Consistent Storage Paths

# Same path structure on both servers
# Master
STORAGE_PATH=/var/lib/vski/files

# Replica
STORAGE_PATH=/var/lib/vski/files

4. Test File Sync

# Upload a file on master
curl -X POST http://master:3001/api/collections/docs/records \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@test.pdf"

# Wait for sync interval
sleep 60

# Verify on replica
curl http://replica:3002/files/docs/record_id/test.pdf

5. Plan for Large Files

For systems with large files:

Increase SYNC_INTERVAL to reduce bandwidth
Monitor sync duration
Consider S3 for better scalability

API Endpoints

The file sync uses these internal replica endpoints:

Endpoint	Description
`GET /api/replica/files?since=`	Get journal entries since ID
`GET /api/replica/file/*path`	Download file from master
`POST /api/replica/files/ack`	Acknowledge synced entries

These are internal endpoints used by the sync process and require replica authentication.

Architecture Decisions

Why Store Journal in Database?

Atomic with DB sync - Journal replicates with the database
No separate state - Single source of truth
Automatic consistency - DB transaction guarantees
Simple recovery - Restore DB, get journal state

Why Dual Cleanup Strategy?

Ack-based - Immediate cleanup when replicas confirm
Time-based - Safety net for:
- Offline replicas
- Failed acknowledgments
- New replicas (don't need old entries)

Why Not Real-time File Sync?

Batching - More efficient for multiple files
Consistency - Files sync after DB is consistent
Simplicity - No separate file sync connection
Reliability - Retry on next cycle if failed