Skip to content

Simple modern data stack for processing and distributed SQL querying using Spark/Delta, Trino, and Minio(S3)

Notifications You must be signed in to change notification settings

ahodroj/experient-data-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experient Data Platform

A modern data stack for ingestion, processing, and data analytics using Minio, Trino, Spark, and Jupyter

Architecture Layers

image

Run Locally with Docker-compose

Build the entire data platform

docker-compose up -d

Query Engine Shell

docker container exec -it query-engine trino

Creating schemas through the query engine

CREATE SCHEMA minio.data_lake
WITH (location = 's3a://warehouse/');

CREATE TABLE minio.data_lake.companies
WITH (
    format = 'PARQUET',
    external_location = 's3a://warehouse/companies/'
) 
AS SELECT * FROM operational.business.organizations;

Inpsecting Metadata

Log into the postgres container

docker exec -it "postgres" psql -U admin -d "hive_db"

To inspect them metadata catalog

SELECT * from "DBS";

Shutdown

docker-compose down

Data Processing with Spark

Interactive Scala Shell

docker run -it spark /opt/spark/bin/spark-shell

Interactive Python Shell

docker run -it spark:python3 /opt/spark/bin/pyspark

About

Simple modern data stack for processing and distributed SQL querying using Spark/Delta, Trino, and Minio(S3)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published