Kapôt is a distributed SQL query engine powered by the Rust implementation of [Apache Arrow][arrow] and [Apache Arrow DataFusion][datafusion]. It's based on [Apache Datafusion Ballista][https://datafusion.apache.org/ballista/], but it is being updated.
Kapôt is designed to work primary on a kuberentes cluster, but is also able to work with a simple cluster joining instances. It's already designed to work primary on distributed file systems like S3 compatibles.
If you are looking for documentation for a released version of Kapôt, please refer to the
[Kapôt User Guide][user-guide].
Kapôt implements a similar design to Apache Spark (particularly Spark SQL), but there are some key differences:
- The choice of Rust as the main execution language avoids the overhead of GC pauses and results in deterministic processing times.
- Kapôt is designed from the ground up to use columnar data, enabling a number of efficiencies such as vectorized processing (SIMD) and efficient compression. Although Spark does have some columnar support, it is still largely row-based today.
- The combination of Rust and Arrow provides excellent memory efficiency and memory usage can be 5x - 10x lower than Apache Spark in some cases, which means that more processing can fit on a single node, reducing the overhead of distributed compute.
- The use of Apache Arrow as the memory model and network protocol means that data can be exchanged efficiently between executors using the [Flight Protocol][flight], and between clients and schedulers/executors using the [Flight SQL Protocol][flight-sql]
- Supports HDFS as well as cloud object stores. S3 is supported today and GCS and Azure support is planned.
- DataFrame and SQL APIs available from Python and Rust.
- Clients can connect to a Kapôt cluster using [Flight SQL][flight-sql].
- JDBC support via Arrow Flight SQL JDBC Driver
- Scheduler web interface and REST UI for monitoring query progress and viewing query plans and metrics.
- Support for Docker, Docker Compose, and Kubernetes deployment, as well as manual deployment on bare metal.