Apache Airflow

This repo was built to share some of the code I used during a course I took of Data Engineering. The main goal is to extract data from Twitter API, then store it in a datalake through Spark RDD and then extracting insights from the data. The chosen ELT method was medallion storage, starting from bronze files, thus raw data, then silve with minimium treatment, then gold, with analysis read for other teams to consume.

Original code sources are Twitter Development and Rafael Boittega.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
airflow		airflow
spark		spark
README.md		README.md
recent_search.py		recent_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apache Airflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

marcelatanaka-datascience/Apache-Airflow

Folders and files

Latest commit

History

Repository files navigation

Apache Airflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages