This project showcases a complete end-to-end data engineering + DevOps + BI visualization pipeline based on the 2021 Formula 1 season 🏁.
We combined data transformation with Python, Azure Blob Storage for cloud hosting, GitHub Actions for CI/CD, and Power BI for interactive insights into which driver conquered each Grand Prix.
Our goal?
Build a project that not only delivers clean insights but does it like a pit crew — fast, precise, and with zero engine failures 🧑🔧⚙️
Phase Description 🏁 Data Ingestion Sourced the Formula 1 Fantasy 2021 dataset from Kaggle 🛠️ Data Processing Used Python (Pandas) to clean and reshape the data 🔁 CI/CD GitHub Actions triggers pipeline on every push ☁️ Cloud Storage Processed file uploaded to Azure Blob using SDK 📊 Visualization Power BI Dashboard shows GP winners, driver trends, and race breakdowns
Phase Description 📦 Data Ingestion Downloaded Formula 1 Fantasy 2021 dataset from Kaggle ⚙️ Data Processing Python script (pandas) to clean and reshape the dataset 🔁 CI/CD GitHub Actions pipeline auto-triggers transformation & upload to Azure ☁️ Cloud Storage Output CSV stored in Azure Blob (Gen2) using Azure SDK 📊 Visualization Power BI dashboard to display race-by-race Grand Prix winners
Built with Power BI Desktop, our dashboard includes:
🥇 Bar charts showing which driver won each Grand Prix
📈 Line charts of driver performance trends across the season
📌 Slicers to filter by driver or GP
🏆 Cards showing total races, unique winners, and top performers
🏎️ Note: Despite Max Verstappen and Lewis Hamilton battling it out on the track, in our dashboard, they battle in bar charts and slicers 😄 It’s like F1, but with less carbon emissions and more pandas 🐼
Data source - https://www.kaggle.com/datasets/prathamsharma123/formula-1-fantasy-2021?resource=download
The workflow is defined in .github/workflows/data-pipeline.yml and includes:
Trigger on push
Set up Python & dependencies
Run scripts/transform.py
Upload to Azure Blob via SDK using GitHub Secrets
Clone the repo & set up your Python environment
Add your Azure Blob credentials as GitHub Secrets:
AZURE_STORAGE_CONNECTION_STRING
Push changes to trigger CI/CD
View the file in Azure Blob → Load into Power BI via SAS URL
##🧪 CI/CD Pipeline Details File: .github/workflows/data-pipeline.yml
✅ Triggered on push or commit
✅ Installs dependencies (pandas, azure-storage-blob)
✅ Runs transform.py to clean data
✅ Uploads processed data to Azure Blob (processed container)
✅ Secrets managed via GitHub Secrets
Add live race data integration from F1 APIs 🛰️
Build an automatic refresh dashboard connected to Azure Blob
Integrate driver comparison analytics across seasons
This project was a full-team effort, developed with the passion of an F1 pit crew and the speed of a GitHub Action 🛠️⚡
Shivali — Data pipeline commander 👨💻& project hype manager ☁️⚔️
Vaishali — Power BI dashboard & Python
Adharsh — DevOps engineer & Azure storage
Together, we turned laps of raw CSVs into beautiful dashboards, and managed not to crash into NullTypeError() on the last turn 🏎️💥
This isn't just a project — it's a digital racetrack where:
Pandas do the pit stops
YAML handles the strategy
Azure stores the trophies
And Power BI waves the checkered flag 🏁
Thanks for joining us on this ride through the world of F1 data & cloud tech 🚀
Team F1-Pipeline For collaborations or questions, reach out via GitHub or LinkedIn
yaml Copy Edit
Let me know if you want this turned into a README.md file or want a logo/banner at the top!
f1-data-pipeline/
│
├── .github/workflows/ # GitHub Actions CI/CD pipeline
│ └── data-pipeline.yml
├── data/
│ ├── raw/ # Raw Kaggle dataset (CSV)
│ └── processed/ # Transformed & cleaned dataset
├── scripts/
│ └── transform.py # Python script using pandas
├── README.md