Spark Job Server offers a RESTful interface for submitting, managing, and running jobs or contexts on Apache Spark. Rather than requiring every application to embed Spark or manage Spark contexts manually, this server abstracts a long-lived service where clients can upload JARs, start and stop contexts, submit jobs synchronously or asynchronously, and manage named objects (RDDs / DataFrames) across job executions. It supports multiple modes (transient jobs, persistent contexts for reuse, streaming, SQL/Hive, etc.), and can be integrated with authentication/authorization systems (e.g. via Apache Shiro). The architecture isolates Spark contexts (optionally in separate JVMs), isolates job dependencies, and persists job / jar metadata via pluggable DAOs. It supports deployment across cluster managers (YARN, Mesos, etc.) and aims to simplify Spark-as-a-service scenarios.
Features
- REST API for submitting jobs, uploading JARs or binaries, managing job contexts
- Support for persistent and transient contexts: reuse of Spark session or context for multiple jobs vs standalone contexts per job
- Job result serialization and status tracking through the API
- HTTPS / SSL configuration, authentication & authorization support (e.g. Shiro)
- Deployment on various cluster managers (Standalone, YARN, Mesos), support for Docker etc
- Metadata / persistence for job / JAR info, named objects (RDDs / DataFrames cached and retrieved by name)