Skip to content

marcelatanaka-datascience/Apache-Airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Apache Airflow

This repo was built to share some of the code I used during a course I took of Data Engineering. The main goal is to extract data from Twitter API, then store it in a datalake through Spark RDD and then extracting insights from the data. The chosen ELT method was medallion storage, starting from bronze files, thus raw data, then silve with minimium treatment, then gold, with analysis read for other teams to consume.

Original code sources are Twitter Development and Rafael Boittega.

About

This repo os destined to store the codes I used during a Data Engineering course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages