AI/ML Engineer • Independent Researcher • Entrepreneur • AI Safety & Trust 🌐 Portfolio · 📧 Email · 💼 LinkedIn · 🐙 GitHub
Hi, I’m Afridi — an AI/ML engineer and independent researcher obsessed with building verifiable, trustworthy, safe AI systems.
- 🧠 Founder & CTO at XCL3NT, an AI-first commerce brand
- 🤖 Researcher behind Dynamic Chain-of-Thought Reward Models (D-CoT) — Read D-CoT
- 🛡️ Focus: AI Safety & Multilingual Familarity — designing systems with evidence, evaluation, and safeguards by default
- 🌍 Remote-ready and open to relocation (US/EU/NZ/SEA)
Mission: Make AI safer, more transparent, and actually helpful to humanity. 🌱
- AI Safety & Trust: principled safeguards, red-teaming, abstain/route policies, and post-hoc verification
- Verifiable QA systems (answers backed by evidence)
- Model behavior analysis & safety alignment
- Evaluation frameworks (metrics, gates, nightly reports)
- Human-data pipelines (collection → curation → evals)
- Cost/freshness routing and retrieval-generation hybrids
Project | Description | Stack |
---|---|---|
Evidence‑Bound Answering System | Evidence‑bound answer engine (retriever → answer → verifier) | FastAPI, Next.js, Docker, Helm |
Prompt Contracts + Fuzzing CI for Answer Engines | Prompt contracts + stress packs + CI gates | Python, YAML DSL |
Proof‑Answers | Proof‑carrying answers with minimal evidence graphs | Python, Graph APIs |
UIRE | Universal Intent Resolution Engine (handles ambiguity) | FastAPI, Docker, Helm |
Human‑Guided Parametric‑vs‑Retrieval Gating | Freshness/cost‑aware routing and gating | Python, Policy Engine |
TruthLens | Claim → Evidence fact‑checking engine | HF Spaces, Transformers |
Plus: DataLoaderSpeedrun, BreezeMind‑Pro, Career Vision AI, Human‑Feedback‑Safety‑Simulator, and more on my GitHub.
- 📊 Reduced hallucinations by −38%, latency by −23%, and cost by −44% across 5+ pipelines
- 🏆 Boosted factual F1 by +7–12pp and alignment quality on ArenaHard by +3.4pp with D‑CoT RMs
- 📚 Published research like Grok‑3 and Grok‑3+
- 🧩 Designed prompt contracts, nightly eval dashboards, and safety gates that scale
Languages: Python, C++, TypeScript, JS Frameworks: PyTorch, TensorFlow, JAX, FastAPI, Next.js Infra: Docker, Kubernetes, Helm, Prometheus, Grafana Concepts: MoE, FP8, RLHF, KV caching, LoRA, DQN Other: Z3, Lean4, CI/CD, Retrieval, Eval pipelines
If you’re building frontier models, eval frameworks, or safety tooling — I’d love to collaborate. Let’s make AI safer, smarter, and actually trustworthy. 🛡️
“AI safety isn’t a checkbox — it’s a responsibility.” – Me, probably during a caffeine high ☕😄
- Build evidence‑bound systems (claims must cite sources)
- Add prompt contracts + CI gates for regressions
- Use nightly evals with safety and calibration metrics
- Prefer abstain/route over confident nonsense
- Ship receipts: versions, seeds, costs, and checks for replayability
I treat debugging like detective work… except the culprit is me from 3 AM last night. 🕵️♂️