s3stor is a command-line tool for backing up files to S3-compatible storage (e.g., AWS S3, Wasabi) with block-based deduplication, point-in-time snapshots, and efficient file management. Designed for reliability and multi-writer safety, it supports syncing files, creating snapshots (with Volume Shadow Copy Service on Windows), listing/restoring files, and cleaning up unused data. Ideal for backup scenarios requiring data integrity and storage efficiency.
# Install Go (https://go.dev/doc/install)
git clone <your-repo-url>
cd s3stor
go build -o s3stor
# Configure Wasabi (or other S3-compatible storage)
export S3_PROVIDER=wasabi
export S3_BUCKET=your-bucket-name
export S3_REGION=us-east-1
export S3_ENDPOINT=https://s3.us-east-1.wasabisys.com
export AWS_ACCESS_KEY_ID=your-wasabi-access-key
export AWS_SECRET_ACCESS_KEY=your-wasabi-secret-key
# Sync a file to S3
./s3stor sync test_out/file1.txt
# Output: Synced file1.txt (123 bytes)
# Create a snapshot
./s3stor snapshot test_out sn001 file1.txt
# Output: Snapshot sandow-sn001 created with 1 files
# List files in snapshot
./s3stor ls sandow-sn001
# Output: Files in snapshot sandow-sn001 (created 2025-07-27T22:50:00Z by sandow):
# - file1.txt (123 bytes)
# Restore a file from snapshot
./s3stor get sandow-sn001 file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txt
# Delete a file from global catalog
./s3stor delete file1.txt
# Output: Deleted file: file1.txt
# Block cleanup completed: 0 blocks deleted
Jump to Usage for more examples or Architecture for how it works.
- Features
- Architecture
- Installation
- Configuration
- Usage
- Examples
- S3 Bucket Structure
- Locking Mechanism
- Troubleshooting
- Contributing
- License
- Block-Based Deduplication: Splits files into blocks, stores unique blocks by SHA-256 hash, and reuses them across files and snapshots to save storage.
- Point-in-Time Snapshots: Creates consistent backups using Volume Shadow Copy Service (VSS) on Windows, with independent file maps for each snapshot.
- Multi-Writer Safety: Uses S3-based locking to prevent conflicts when multiple instances (e.g., on different machines) access the same bucket.
- File Management:
sync
: Upload files to S3 with deduplication, creating global catalog if missing.ls
: List files in global catalog or snapshots, creating global catalog if missing.get
: Restore files from snapshots or global catalog.map
: Display block mappings for a file in global catalog or a snapshot.snapshot
: Create snapshots of specified files.delete-snapshot
: Remove snapshots and their metadata.delete
: Remove files from global catalog with safe block cleanup.cleanup-blocks
: Remove unreferenced blocks to reclaim storage.
- S3 Compatibility: Works with AWS S3, Wasabi, and other S3-compatible providers.
- Efficient Cleanup: Safely deletes unreferenced blocks only after checking all file maps (global and snapshot).
s3stor
organizes data in an S3 bucket using a structured layout, with separate catalogs for global files and snapshots, deduplicated block storage, and a locking mechanism for concurrency.
-
Global Catalog (
catalog.json
):- Stores metadata for files synced via
sync
. - Automatically created as an empty catalog (
[]
) on firstsync
orls
if not found. - Format: JSON array of entries:
[ { "file_name": "file1.txt", "file_size": 123, "map_key": "maps/file1.txt.json" }, { "file_name": "d001/f005.txt", "file_size": 456, "map_key": "maps/d001/f005.txt.json" } ]
map_key
points to a file map listing block hashes.
- Stores metadata for files synced via
-
File Maps (
maps/<file_name>.json
):- For each file in the global catalog, stores metadata and a list of SHA-256 block hashes:
{ "file_name": "file1.txt", "file_size": 123, "block_size": 1048576, "blocks": ["a1b2c3d4...", "e5f6g7h8..."] }
- Blocks are stored in
blocks/<hash>
.
- For each file in the global catalog, stores metadata and a list of SHA-256 block hashes:
-
Snapshot Catalog (
<hostname>/snapshots/<snapshot_id>/catalog.json
):- Created by
snapshot
, stores metadata for files in a snapshot (e.g.,sandow/snapshots/sandow-sn001
). - Format: JSON object:
{ "snapshot_id": "sandow-sn001", "timestamp": "2025-07-27T22:50:00Z", "computer_id": "sandow", "files": [ { "file_name": "file1.txt", "file_size": 123, "map_key": "sandow/snapshots/sandow-sn001/maps/file1.txt.json" } ] }
- Independent of global catalog, with separate file maps.
- Created by
-
Snapshot File Maps (
<hostname>/snapshots/<snapshot_id>/maps/<file_name>.json
):- Similar to global file maps, lists block hashes for snapshot files.
- Ensures snapshots are self-contained, unaffected by global catalog changes.
-
Block Storage (
blocks/<hash>
):- Stores unique file blocks, identified by SHA-256 hashes.
- Deduplication ensures identical blocks are stored only once, referenced by multiple file maps.
-
Locks (
locks/global/<resource>.lock
,locks/<hostname>/snapshots/<snapshot_id>/<resource>.lock
):- S3 objects used for concurrency control (e.g.,
locks/global/catalog.lock
,locks/global/file1.txt.lock
). - Prevents race conditions in multi-writer scenarios (e.g., multiple
s3stor
instances). - Automatically expire via S3 lifecycle policy (1-day retention).
- S3 objects used for concurrency control (e.g.,
- Sync:
- Read local file, split into blocks, compute SHA-256 hashes.
- Upload new blocks to
blocks/<hash>
if not already present. - Create file map (
maps/<file_name>.json
) listing block hashes. - Create or update
catalog.json
with file metadata.
- Snapshot:
- Use VSS (Windows) for consistent file access.
- Create snapshot catalog (
<hostname>/snapshots/<snapshot_id>/catalog.json
). - Copy or create file maps in
<hostname>/snapshots/<snapshot_id>/maps/
. - Reuse existing blocks in
blocks/<hash>
.
- Delete:
- Remove file from
catalog.json
and delete its file map. - Clean up unreferenced blocks by checking all file maps (global and snapshot).
- Remove file from
- Get:
- Read file map (from global catalog or snapshot) to get block hashes.
- Download blocks from
blocks/<hash>
. - Reconstruct file locally.
- Files are split into fixed-size blocks (default: 1MB).
- Each block’s SHA-256 hash is computed and stored in
blocks/<hash>
. - File maps reference these blocks, enabling deduplication across files and snapshots.
- Example: If
file1.txt
andfile2.txt
share a block, it’s stored once inblocks/a1b2c3d4...
and referenced by both file maps.
- Install Go:
- Download and install Go (version 1.16+): https://go.dev/doc/install.
- Clone Repository:
git clone <your-repo-url> cd s3stor
- Build:
go build -o s3stor
- Verify:
./s3stor # Output: Usage: go run main.go <sync|ls|get|map|snapshot|delete-snapshot|cleanup-blocks|delete> [args...]
s3stor
uses environment variables for S3 configuration. Example for Wasabi:
export S3_PROVIDER=wasabi
export S3_BUCKET=your-bucket-name
export S3_REGION=us-east-1
export S3_ENDPOINT=https://s3.us-east-1.wasabisys.com
export AWS_ACCESS_KEY_ID=your-wasabi-access-key
export AWS_SECRET_ACCESS_KEY=your-wasabi-secret-key
Ensure your S3 credentials allow:
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::your-bucket-name/*", "arn:aws:s3:::your-bucket-name"]
}
Set an S3 lifecycle policy to expire locks after 1 day:
aws s3api put-bucket-lifecycle-configuration --bucket your-bucket-name --lifecycle-configuration '{
"Rules": [{
"ID": "CleanLocks",
"Status": "Enabled",
"Filter": {"Prefix": "locks/"},
"Expiration": {"Days": 1}
}]
}'
s3stor <command> [args...]
- sync <file_or_dir>:
- Uploads files to S3 with deduplication, creating global catalog if missing.
- Example:
./s3stor sync test_out/file1.txt
- ls [snapshot_id]:
- Lists files in global catalog (creates empty catalog if missing) or a specific snapshot.
- Example:
./s3stor ls
or./s3stor ls sandow-sn001
- get [<snapshot_id>] <file_name> <output_dir>:
- Restores a file from a snapshot (if
snapshot_id
provided) or global catalog. - Example:
./s3stor get sandow-sn001 file1.txt ./restore
or./s3stor get file1.txt ./restore
- Restores a file from a snapshot (if
- map [<snapshot_id>] <file_name>:
- Displays block mappings for a file in the global catalog or a snapshot (if
snapshot_id
provided). - Example:
./s3stor map file1.txt
or./s3stor map sandow-sn001 file1.txt
- Displays block mappings for a file in the global catalog or a snapshot (if
- snapshot <source_dir> <snapshot_id> [file_names...]:
- Creates a snapshot of specified files using VSS (Windows).
- Example:
./s3stor snapshot test_out sn001 file1.txt
- delete-snapshot <snapshot_id>:
- Deletes a snapshot and its metadata, with block cleanup.
- Example:
./s3stor delete-snapshot sandow-sn001
- cleanup-blocks:
- Removes unreferenced blocks after checking all file maps.
- Example:
./s3stor cleanup-blocks
- delete <file_name>:
- Removes a file from the global catalog, with block cleanup.
- Example:
./s3stor delete file1.txt
Upload a file and a directory to S3:
./s3stor sync test_out/file1.txt
# Output: Synced file1.txt (123 bytes)
./s3stor sync test_out/d001
# Output: Synced d001/f005.txt (456 bytes)
Create a snapshot of specific files:
./s3stor snapshot test_out sn001 file1.txt d001/f005.txt
# Output: Snapshot sandow-sn001 created with 2 files
List files in the global catalog (creates empty catalog if none exists):
./s3stor ls
# Output: Files in global catalog:
# - file1.txt (123 bytes)
# - d001/f005.txt (456 bytes)
# If no catalog exists:
# Output: Files in global catalog:
# (none)
List files in a snapshot:
./s3stor ls sandow-sn001
# Output: Files in snapshot sandow-sn001 (created 2025-07-27T22:50:00Z by sandow):
# - file1.txt (123 bytes)
# - d001/f005.txt (456 bytes)
View block mappings for a file in the global catalog:
./s3stor map file1.txt
# Output: File Map for file1.txt:
# File Name: file1.txt
# File Size: 123 bytes
# Block Size: 1048576 bytes
# Blocks:
# 1: a1b2c3d4...
# 2: e5f6g7h8...
View block mappings for a file in a snapshot:
./s3stor map sandow-sn001 file1.txt
# Output: File Map for file1.txt:
# File Name: file1.txt
# File Size: 123 bytes
# Block Size: 1048576 bytes
# Blocks:
# 1: a1b2c3d4...
# 2: e5f6g7h8...
Restore a file from a snapshot:
./s3stor get sandow-sn001 file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txt
Restore a file from the global catalog:
./s3stor get file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txt
Remove a file from the global catalog:
./s3stor delete file1.txt
# Output: Deleted file: file1.txt
# Block cleanup completed: 0 blocks deleted
Remove a snapshot:
./s3stor delete-snapshot sandow-sn001
# Output: Snapshot sandow-sn001 deleted
Manually clean unreferenced blocks:
./s3stor cleanup-blocks
# Output: Block cleanup completed: 2 blocks deleted
After running commands, your bucket (your-bucket-name
) will have:
your-bucket-name/
├── catalog.json
├── maps/
│ ├── file1.txt.json
│ ├── d001/f005.txt.json
├── blocks/
│ ├── a1b2c3d4...
│ ├── e5f6g7h8...
├── <hostname>/
│ ├── snapshots/
│ │ ├── sandow-sn001/
│ │ │ ├── catalog.json
│ │ │ ├── maps/
│ │ │ │ ├── file1.txt.json
│ │ │ │ ├── d001/f005.txt.json
├── locks/
│ ├── global/
│ │ ├── catalog.lock
│ │ ├── file1.txt.lock
│ │ ├── cleanup.lock
│ ├── <hostname>/
│ │ ├── snapshots/
│ │ │ ├── sandow-sn001/
│ │ │ │ ├── file1.txt.lock
- Purpose: Ensures thread-safety in multi-writer scenarios (e.g., multiple
s3stor
instances onsandow
or other machines). - Implementation: S3 objects (
locks/global/<resource>.lock
,locks/<hostname>/snapshots/<snapshot_id>/<resource>.lock
) act as mutexes.- Example:
locks/global/catalog.lock
for global catalog updates. - Example:
locks/sandow/snapshots/sandow-sn001/file1.txt.lock
for snapshot file operations.
- Example:
- Acquisition:
- Attempts to write lock object with a unique owner (e.g., hostname
sandow
). - Retries (default: 3 attempts) if locked by another instance.
- Attempts to write lock object with a unique owner (e.g., hostname
- Expiration: Locks expire after 1 day via S3 lifecycle policy, preventing deadlocks.
- Commands Using Locks:
sync
,delete
,snapshot
,delete-snapshot
,cleanup-blocks
.
- Snapshot Creates 0 Files:
- Cause: Files not found in
source_dir
, VSS access denied, or lock conflicts. - Fix:
- Verify files:
ls test_out/file1.txt
. - Check VSS permissions (Windows): Run as administrator.
- List locks:
aws s3 ls s3://your-bucket-name/locks/
. - Remove stuck locks:
aws s3 rm s3://your-bucket-name/locks/global/file1.txt.lock
.
- Verify files:
- Cause: Files not found in
- File Not Found in Catalog:
- Cause: File not synced or deleted.
- Fix: Run
./s3stor ls
to check catalog, thensync
the file.
- Global Catalog Not Found:
- Cause: No prior
sync
orls
commands executed. - Fix: Run
./s3stor ls
or./s3stor sync <file>
to create an empty catalog.
- Cause: No prior
- Lock Acquisition Fails:
- Cause: Another instance holds the lock.
- Fix: Wait and retry, or increase
maxLockRetries
in code (default: 3).
- S3 Permission Errors:
- Cause: Insufficient IAM permissions.
- Fix: Update policy with required actions (
PutObject
,GetObject
,DeleteObject
,ListBucket
).
- Blocks Not Cleaned Up:
- Cause: Eventual consistency in S3 or recent snapshot creation.
- Fix: Retry
cleanup-blocks
or add delay (e.g.,time.Sleep(1 * time.Second)
indeleteFile
).
- Fork the repository and submit pull requests.
- Report issues or suggest features via GitHub Issues.
- Enhance features:
- Add
--dry-run
fordelete
andcleanup-blocks
. - Support multiple file deletions:
./s3stor delete file1.txt file2.txt
. - Parallelize block cleanup for large buckets.
- Add man page:
man s3stor
.
- Add
MIT License. See LICENSE for details.