Skip to content

A distributed file-system optimized for write-once, read-many (WORM) use-cases such as archival media storage.

Notifications You must be signed in to change notification settings

avirtuos/wormfs

Repository files navigation

WormFS

WormFS, short for write-once-read-many file system, is a user-space distributed file system that uses erasure encoding to spread files across multiple storage devices, each running their own commodity filesystems. This allows great flexibility with respect to configuring device failure tolerance at a file or directory level. I envision this being extremely useful for media storage and deep archive use-cases.

Much of the architecture of this project is inspired by LizardFS' simplicity with a goal of offering greater control and visibility over how chunks are stored, replicated, and recovered.

📊 Project Status

Current Phase: 🚧 Phase 2 - Consensus Layer (In Development)

WormFS is a work in progress that's being developed iteratively following a phased implementation plan. Phase 1 is complete with all core components wired together and fully documented! Phase 2 is now underway, adding distributed consensus capabilities with automatic failure detection and recovery.

What's Working Now (Phase 1 - Complete ✅)

  • MetadataStore: SQLite-based metadata persistence with WAL mode
  • FileStore: Local chunk storage with Reed-Solomon erasure coding (2+1 default)
  • FileSystemService: FUSE integration for filesystem operations
  • StorageNode: Component orchestrator that wires everything together
  • Configuration: TOML-based config with CLI overrides and dual file mode formats
  • File Operations: Create, read, write, delete files with data integrity verification
  • Directory Operations: Create, list, remove nested directories
  • CLI & Configuration: Production-ready command-line interface
  • Integration Tests: Comprehensive end-to-end test suite covering all features
  • Documentation: Complete user guide, configuration reference, and troubleshooting guide
  • Demo Script: Interactive demo with MD5 integrity verification

Phase 2 Progress (In Development 🚧)

  • StorageRaftMember: Core Raft consensus implementation for distributed coordination
  • TransactionLogStore: Persistent Raft log storage with vote persistence
  • Cluster Manager: Fully integrated failure detection and monitoring
    • Fully integrated into StorageRaftMember with automatic lifecycle management
    • Automatic node health monitoring with heartbeat tracking and replication lag detection
    • Failed nodes remain as voters (operator-driven membership management following industry best practices)
    • Automatic promotion of recovered learners back to voters
    • Configurable failure detection thresholds (conservative/moderate/aggressive presets)
    • Quorum-safe membership management prevents split-brain during network partitions
    • Structured event logging for observability and audit trails
    • Enabled by default with configurable presets
  • TransactionManager: Distributed transaction support with ACID guarantees
    • Two-phase commit protocol for atomic metadata operations
    • Transaction lifecycle management (begin/commit/abort)
    • Distributed locking with timeout-based deadlock prevention
    • Metadata change subscription system for real-time event notifications
    • Configurable transaction limits, timeouts, and subscription settings
    • Comprehensive tests for ACID properties, isolation, consistency, and concurrency
  • 🚧 Multi-node cluster formation and leader election (in progress)

Documentation

What's Coming Next

  • 🚧 Phase 2: Raft consensus and distributed coordination (in progress - cluster manager complete!)
  • 📋 Phase 3: Multi-node storage with distributed erasure coding
  • 📋 Phase 4: Robustness and recovery features
  • 📋 Phase 5: Observability and production testing

📚 Documentation

Core Documentation

Implementation Plans

Component Specifications

Additional Documentation


🚀 Getting Started

Dependencies

Required:

  • Rust 1.70+ with Cargo
  • FUSE3 (Linux) or macFUSE (macOS)
  • Build essentials (gcc, make)

Install FUSE:

# Ubuntu/Debian
sudo apt-get install fuse3 libfuse3-dev

# RHEL/CentOS/Fedora
sudo yum install fuse3 fuse3-devel

# macOS
brew install macfuse

Running the Demo

The quickest way to experience WormFS:

./scripts/demo_wormfs.sh

What the Demo Does:

The interactive demo script automatically:

  1. Builds WormFS from source
  2. Creates a temporary filesystem with erasure coding (2+1 Reed-Solomon)
  3. Mounts the filesystem via FUSE
  4. Performs file operations (create, read, write, delete)
  5. Demonstrates directory operations (mkdir, readdir, rmdir)
  6. Displays real-time metrics (I/O amplification, cache hit rates, performance)
  7. Launches an admin web UI for monitoring

Accessing the Web UI:

Open your browser to http://127.0.0.1:9090/ to view:

  • Real-time system metrics and performance graphs
  • Current configuration settings
  • System health status and component diagnostics
  • File and chunk statistics

🔧 Troubleshooting

State Machine Resync

Symptom: Node logs show "STATE MACHINE APPLY FAILURE - Triggering automatic resync"

Cause: The Raft state machine failed to apply a committed operation, indicating potential state corruption or a serious bug.

What Happens:

  • Node automatically enters read-only mode (rejects new writes)
  • Creates diagnostic file: data/snapshots/NEEDS_RESYNC
  • Logs detailed failure information
  • Waits for operator intervention

Recovery Steps:

  1. Check the NEEDS_RESYNC file for failure details
  2. Stop the affected node
  3. Clear or backup the corrupted state (optional)
  4. Restart the node
  5. OpenRaft will automatically install a snapshot from the leader
  6. Node resumes normal operation once snapshot is applied

Prevention:

  • This should be extremely rare in production
  • Frequent resyncs indicate a bug that should be reported
  • Check logs for patterns before the failure

🤝 Contributing

This project is currently in active development as a learning exercise. While it's not yet ready for external contributions, you're welcome to:

  • Open issues for bugs or suggestions
  • Star the repository if you find it interesting
  • Follow along with the development progress

📄 License

Apache-2.0


🙏 Acknowledgments


Note: This is a work-in-progress learning project. I'm using Claude (AI assistant) to help me learn how best to integrate GenAI Tools into my SDLC workflow. Expect rough edges and evolving architecture as the project matures!

About

A distributed file-system optimized for write-once, read-many (WORM) use-cases such as archival media storage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •