Keeping systems reliable, resilient, and observable — at scale.
☁️ Cloud & Infra
AWS GCP Terraform OpenShift Kubernetes Docker
🧰 CI/CD & Automation
GitHub Actions Jenkins Ansible
📈 Observability
Prometheus Grafana Dynatrace OpenTelemetry
📦 Platforms
Linux Debian/Ubuntu RHEL Bash Python Go
- Design and maintain highly available, fault-tolerant systems
- Create scalable CI/CD pipelines that ship code faster and safer
- Build infrastructure-as-code to ensure consistency and repeatability
- Implement monitoring, logging, and alerting to sleep better at night
- Guide engineering teams in reliability best practices and incident response
“Hope is not a strategy. Automate everything, observe everything, break before it breaks.”
- 🔍 Observability > Monitoring
- 🔄 Immutable > Mutable
- 🧪 Chaos > Complacency
- 📚 Postmortems > Blame
| Project | Description | Stack |
|---|---|---|
| 🔄 OpenShift Upgrade CUJ | ||
| 📈 Alert Architect | Dynamic alert tuning system to reduce noise & burnout | Prometheus, Dynatrace, OpenTelemetry |


