Data Engineering Bootcamp Detailed Syllabus
Data Engineering Bootcamp Detailed Syllabus
DETAILED SYLLABUS
Overview
In our endeavour to build data culture and democratize Data Science learning, we are
DataTalks.Club. The online bootcamp will have a series of week-wise learning modules
This is a community initiative, driven by experts and mentors, and you have the
Prerequisites
● Nil, anyone with a passion for learning can make it to the finish line :)
Format
Tutors will provide learners with guided learning paths, resources and exercises to solve. The
entire schedule, practical details, registration details will be put up very soon. A brief summary
of the format can be found below:
● Week-wise modules: Week-wise learning modules will be released that would allow you
to have a structured learning path
● For real-time communication, we will be using Discord. This medium will help learners
to clear doubts on a real-time basis if they are stuck somewhere. In addition, this will
also allow learners to interact with the mentors and fellow learners
Schedule
● Course overview
● Introduction to GCP
● Docker and docker-compose
● Running Postgres locally with Docker
● Setting up infrastructure on GCP with Terraform
● Preparing the environment for the course
● Homework
● Data Lake
● Workflow orchestration
● Setting up Airflow locally
● Ingesting data to GCP with Airflow
● Ingesting data to local Postgres with Airflow
● Moving data from AWS to GCP (Transfer service)
● Homework
● Batch processing
● What is Spark
● Spark Dataframes
● Spark SQL
● Internals: GroupBy and joins
● More details
Week #6 - Streaming
● Introduction to Kafka
● Schemas (avro)
● Kafka Streams
● Kafka Connect and KSQL
● More details