A modern data stack for ingestion, processing, and data analytics using Minio, Trino, Spark, and Jupyter
Build the entire data platform
docker-compose up -d
Query Engine Shell
docker container exec -it query-engine trino
Creating schemas through the query engine
CREATE SCHEMA minio.data_lake
WITH (location = 's3a://warehouse/');
CREATE TABLE minio.data_lake.companies
WITH (
format = 'PARQUET',
external_location = 's3a://warehouse/companies/'
)
AS SELECT * FROM operational.business.organizations;
Log into the postgres container
docker exec -it "postgres" psql -U admin -d "hive_db"
To inspect them metadata catalog
SELECT * from "DBS";
Shutdown
docker-compose down
Interactive Scala Shell
docker run -it spark /opt/spark/bin/spark-shell
Interactive Python Shell
docker run -it spark:python3 /opt/spark/bin/pyspark