Mohd Ibrahim A. mohdibrahimaiml

💻 Mohd Ibrahim Afridi (Afridi)

AI/ML Engineer • Independent Researcher • Entrepreneur • AI Safety & Trust 🌐 Portfolio · 📧 Email · 💼 LinkedIn · 🐙 GitHub

🚀 About Me

Hi, I’m Afridi — an AI/ML engineer and independent researcher obsessed with building verifiable, trustworthy, safe AI systems.

🧠 Founder & CTO at XCL3NT, an AI-first commerce brand
🤖 Researcher behind Dynamic Chain-of-Thought Reward Models (D-CoT) — Read D-CoT
🛡️ Focus: AI Safety & Multilingual Familarity — designing systems with evidence, evaluation, and safeguards by default
🌍 Remote-ready and open to relocation (US/EU/NZ/SEA)

Mission: Make AI safer, more transparent, and actually helpful to humanity. 🌱

🧪 My Research Focus

AI Safety & Trust: principled safeguards, red-teaming, abstain/route policies, and post-hoc verification
Verifiable QA systems (answers backed by evidence)
Model behavior analysis & safety alignment
Evaluation frameworks (metrics, gates, nightly reports)
Human-data pipelines (collection → curation → evals)
Cost/freshness routing and retrieval-generation hybrids

🛠️ Selected Projects

Project	Description	Stack
Evidence‑Bound Answering System	Evidence‑bound answer engine (retriever → answer → verifier)	FastAPI, Next.js, Docker, Helm
Prompt Contracts + Fuzzing CI for Answer Engines	Prompt contracts + stress packs + CI gates	Python, YAML DSL
Proof‑Answers	Proof‑carrying answers with minimal evidence graphs	Python, Graph APIs
UIRE	Universal Intent Resolution Engine (handles ambiguity)	FastAPI, Docker, Helm
Human‑Guided Parametric‑vs‑Retrieval Gating	Freshness/cost‑aware routing and gating	Python, Policy Engine
TruthLens	Claim → Evidence fact‑checking engine	HF Spaces, Transformers

Plus: DataLoaderSpeedrun, BreezeMind‑Pro, Career Vision AI, Human‑Feedback‑Safety‑Simulator, and more on my GitHub.

📈 Impact Highlights

📊 Reduced hallucinations by −38%, latency by −23%, and cost by −44% across 5+ pipelines
🏆 Boosted factual F1 by +7–12pp and alignment quality on ArenaHard by +3.4pp with D‑CoT RMs
📚 Published research like Grok‑3 and Grok‑3+
🧩 Designed prompt contracts, nightly eval dashboards, and safety gates that scale

🧠 Tech Stack

Languages: Python, C++, TypeScript, JS Frameworks: PyTorch, TensorFlow, JAX, FastAPI, Next.js Infra: Docker, Kubernetes, Helm, Prometheus, Grafana Concepts: MoE, FP8, RLHF, KV caching, LoRA, DQN Other: Z3, Lean4, CI/CD, Retrieval, Eval pipelines

💌 Let’s Collaborate

If you’re building frontier models, eval frameworks, or safety tooling — I’d love to collaborate. Let’s make AI safer, smarter, and actually trustworthy. 🛡️

“AI safety isn’t a checkbox — it’s a responsibility.” – Me, probably during a caffeine high ☕😄

🔒 AI Safety & Trust

Build evidence‑bound systems (claims must cite sources)
Add prompt contracts + CI gates for regressions
Use nightly evals with safety and calibration metrics
Prefer abstain/route over confident nonsense
Ship receipts: versions, seeds, costs, and checks for replayability

📊 GitHub Stats

🧩 Fun Fact

I treat debugging like detective work… except the culprit is me from 3 AM last night. 🕵️‍♂️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly