Skip to content

Commit 19cf35e

Browse files
Alphabetic order
1 parent 6e181b9 commit 19cf35e

File tree

1 file changed

+8
-9
lines changed

1 file changed

+8
-9
lines changed

README.adoc

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,20 @@ toc::[]
1111
== Analytics
1212

1313
* https://spark.apache.org/[Apache Spark] - A unified analytics engine for large-scale data processing. Includes APIs in Scala, Java, Python (known as PySpark), and R (SparkR).
14-
* https://flink.apache.org/[Apache Flink] - Stateful computations over data streams.
1514
* https://beam.apache.org/[Apache Beam] - An open-source implementation of Google DataFlow. Provides capabilites of batch and streaming data processing jobs that run on any execution engine, including Spark, Flink, or its own DirectRunner. Supports multiple APIs in Java, Python, and Go.
15+
* https://flink.apache.org/[Apache Flink] - Stateful computations over data streams.
1616

1717
== Business Intelligence
1818

1919
* https://superset.incubator.apache.org/[Apache Superset] - A modern, enterprise-ready business intelligence web application.
20+
* https://gethue.com/[HUE] - The Hadoop User Interface. Similar to Superset, but interfaces between RDBMS, Hive, Impala, HBase, Spark, HDFS & S3, Oozie, Pig, YARN Job Explorer, and more. Offers an extensible Django environment for custom app integration.
2021
* https://www.metabase.com/[Metabase] - An easy way for everyone in your company to ask questions and learn from data.
2122
* https://redash.io/[Redash] - All the tools to unlock your data.
22-
* https://gethue.com/[HUE] - The Hadoop User Interface. Similar to Superset, but interfaces between RDBMS, Hive, Impala, HBase, Spark, HDFS & S3, Oozie, Pig, YARN Job Explorer, and more. Offers an extensible Django environment for custom app integration.
2323

2424
== Change Data Capture
2525

2626
* https://debezium.io/[Debezium] - Change data capture for MySQL, Postgres, MongoDB, SQL Server and others.
27-
* https://github.com/zendesk/maxwell[Maxwell] - Maxwell's daemon, a MySQL-to-JSON Kafka producer
27+
* https://github.com/zendesk/maxwell[Maxwell] - Maxwell's daemon, a MySQL-to-JSON Kafka producer.
2828

2929
== Datastores
3030

@@ -35,8 +35,8 @@ toc::[]
3535
* https://pinot.apache.org/[Apache Pinot] - A realtime distributed OLAP datastore.
3636
* https://clickhouse.tech/[ClickHouse] - Open Source distributed column-oriented DBMS.
3737
* https://www.influxdata.com/[InfluxDB] - Purpose-Built Open Source Time Series Database.
38-
* https://www.postgresql.org/[Postgres] - The World's Most Advanced Open Source Relational Database.
3938
* https://min.io/[MinIO] - MinIO is a high performance, distributed object storage system and AWS S3 compatible.
39+
* https://www.postgresql.org/[Postgres] - The World's Most Advanced Open Source Relational Database.
4040

4141
== Data Governance and Registries
4242

@@ -47,10 +47,10 @@ toc::[]
4747

4848
== Data Virtualization
4949

50+
* https://drill.apache.org/[Apache Drill] - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage.
5051
* https://github.com/dremio/dremio-oss[Dremio] - A data lake engine. Provides an Apache Arrow-based query and acceleration engine together with the ability to create an IT-governed self-service layer for data scientists and analysts.
5152
* http://teiid.io/[Teiid] - A relational abstraction of different information sources.
5253
* https://prestodb.io/[Presto] - Distributed SQL Query Engine for Big Data.
53-
* https://drill.apache.org/[Apache Drill] - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage.
5454

5555
== Data Orchestration
5656
* https://github.com/Alluxio/alluxio[Alluxio] - Scalable, multi-tiered distributed caching for HDFS, S3, Ceph, NFS, and related filestores. Provides integrations for SQL queries into a Catalog from Spark, Hive, and Presto.
@@ -64,9 +64,8 @@ toc::[]
6464
* https://arrow.apache.org/[Apache Arrow] - A cross-language development platform for in-memory data. It specifies a standardized, language-independent, columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy IPC and streaming messaging.
6565
* https://capnproto.org/[Cap’n Proto] - A data interchange format and capability-based RPC system.
6666
* https://google.github.io/flatbuffers/[FlatBuffers] - An efficient cross platform serialization library for C++, C#, C, Go, Java, JavaScript, Lobster, Lua, TypeScript, PHP, Python, and Rust.
67-
* https://developers.google.com/protocol-buffers[Protocol Buffers] - Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
6867
* https://msgpack.org/index.html[MessagePack] - An efficient binary serialization format. It lets you exchange data among multiple languages like JSON.
69-
68+
* https://developers.google.com/protocol-buffers[Protocol Buffers] - Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
7069

7170
== Integration
7271

@@ -93,11 +92,11 @@ toc::[]
9392

9493
== Stream Processing
9594

95+
* https://heron.incubator.apache.org/[Apache Heron] - The "direct successor of Apache Storm", built to be backwards compatible with Storm's topology API but with a wide array of architectural improvements.
9696
* https://kafka.apache.org/documentation/streams/[Apache Kafka Streams] - A client library for building applications and microservices, where the input and output data are stored in Kafka.
9797
* http://samza.apache.org/[Apache Samza] - A distributed stream processing framework.
9898
* https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html[Apache Spark Structured Streaming] - A scalable and fault-tolerant stream processing engine built on the Spark SQL engine.
9999
* http://storm.apache.org/[Apache Storm] - A distributed realtime computation system.
100-
* https://heron.incubator.apache.org/[Apache Heron] - The "direct successor of Apache Storm", built to be backwards compatible with Storm's topology API but with a wide array of architectural improvements.
101100

102101
== Testing
103102

@@ -107,9 +106,9 @@ toc::[]
107106

108107
* https://github.com/meirwah/awesome-workflow-engines[Awesome Workflow Engines] - A curated list of awesome open source workflow engines.
109108
* https://airflow.apache.org/[Apache Airflow] - A platform created by community to programmatically author, schedule and monitor workflows.
110-
* https://github.com/PrefectHQ/prefect/[Prefect] - A workflow management system designed for modern infrastructure.
111109
* https://nifi.apache.org/[Apache NiFi] - Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic
112110
* https://github.com/knime/[KNIME] - KNIME Analytics Platform offers a WYSIWYG Editor for Spark-based workflows, with over 2000+ integrations. Offers visualization and flow analytics in-place. KNIME Server is a commercially licensed component that adds additional features.
111+
* https://github.com/PrefectHQ/prefect/[Prefect] - A workflow management system designed for modern infrastructure.
113112

114113
== Related Resources
115114

0 commit comments

Comments
 (0)