Python · FastAPI · ML · LLM · CUDA · Rust
Systems-first ML engineer who ships production inference, data pipelines, and fast backend services.
- Production-grade backend services and APIs with
PythonandFastAPIfor model serving and tooling. - Reliable data ingestion and ETL for analytics and retrieval augmented workflows.
- LLM infra: prompt iteration, eval loops, lightweight retrievers and RAG orchestration.
- Performance work: CUDA-aware inference optimizations and low-latency Rust sidecars.
- Built FastAPI backends that expose model outputs, instruments, and realtime endpoints used by product teams.
- Designed and ran ingestion pipelines and ETL that feed OLAP and retrieval systems.
- Implemented evaluation and ranking loops to measure and improve LLM output quality.
- Wrote Rust watcher/updater services for low-latency node monitoring and state propagation.
- FamAgent — modular agent orchestration with a FastAPI backend.
https://github.com/kaushal07wick/FamAgent - Finsight / sqlagent — financial analytics engine and Airflow + FastAPI tooling.
https://github.com/kaushal07wick/sqlagent - BreeBoost — real-time fraud detection pipeline with drift monitoring and alerting.
https://github.com/kaushal07wick/breeboost - LanceDB contributions — ingestion and vector search integration work.
Python · FastAPI · AsyncIO · SQL · Postgres · ClickHouse · Docker · Airflow · CUDA · PyTorch · vLLM · Rust · Git · CI/CD · Logging · Metrics
- Make model outputs production-ready with clear APIs, validation, and observability.
- Turn prototyped LLM flows into repeatable RAG pipelines.
- Reduce latency and cost through targeted inference optimizations.
- Ship small, high-impact features fast in early-stage teams.
Email: [email protected]
Twitter: https://twitter.com/ofcboogeyman
LinkedIn: https://www.linkedin.com/in/11kaushalkumar/


