MSc Data Science (Kingston University) · 92%+ accuracy on transformer NLP models · SHAP & LIME explainability · 6+ years of data-informed professional experience across analytics, operations, and business development.
Over 6 years of professional experience delivering measurable outcomes - 35% revenue growth, 40% repeat business uplift, and 12% cost reduction. MSc in Data Science from Kingston University, with independent portfolio projects spanning fintech, healthcare, legal NLP, HR tech, and crime analytics.
Quantified outcomes across machine learning, explainable AI, and commercial performance — combining production-grade technical delivery with measurable business impact.
Real-world applications of data science, machine learning, and explainable AI.
Modular 12-app Django REST API with JWT auth and role-based permissions, generating 384-dim SBERT embeddings to cosine-rank 3,000+ candidates per job posting. SHAP and LIME explanations are fused 60/40 into a unified explainability report per match, with disparate impact ratio and 4/5 rule computed across gender, ethnicity, and age groups. An async Celery pipeline orchestrates CV parsing, embedding generation, and batch matching, while a trainable GradientBoosting scorer ships with zero-downtime fallback to fixed weights.
Django SaaS platform predicting clinical trial dropout up to 60 days in advance. XGBoost classifier with per-visit SHAP waterfall plots identifies at-risk patients at the individual level. Cox Proportional Hazards model delivers hazard ratios and 30/60/90-day retention probabilities, with stratified Kaplan-Meier curves segmented by risk tier (Low to Critical) via Lifelines. Cohort forecasts with 95% confidence intervals served through a DRF REST API; branded ReportLab PDFs generated per patient including SHAP feature drivers, survival curves, and clinical action logs.
Fine-tuned LegalBERT and Legal-RoBERTa-large ensemble classifying every clause across a 100-type unified taxonomy (CUAD, LEDGAR, MAUD - 132k clauses, macro F1 0.606). Anomaly detection fuses Isolation Forest with a 1024-dim autoencoder; bilateral power imbalance is scored −100 to +100 via sentiment, modal verb, and obligation NLP. Every prediction is explained at token level with SHAP, served through a FastAPI REST API and dark-theme dashboard with downloadable PDF reports.
Production contract intelligence platform with XGBoost/Logistic Regression ensemble (70/30) scoring invoice leakage probability, Isolation Forest anomaly detection per record, and Prophet forecasting with 90% CI bands over a 24-month revenue horizon. SHAP attribution pipeline exposes top risk drivers per invoice; a 7-rule leakage engine generates 13k+ alerts covering missing payments, underbilling, and duplicates. Predictions served via DRF API with a Chart.js dashboard and per-invoice drill-down modals.
Production risk platform with GMM regime classification, Ledoit-Wolf shrinkage, and XGBoost/ElasticNet trained on 50+ macro features, delivering Sharpe 4.42, VaR -12.07%, and CVaR -17.48%. Stress-tested 20+ historical crises with SHAP attribution and plain-English narratives, plus reverse stress-testing for threshold-based shock back-solving. FastAPI dashboard refreshes 20 live instruments hourly.
Deployed a framework-agnostic LLM agent firewall with a YAML policy DSL (6 operators, hot-reload on file mtime), dual-layer safety enforcement via system prompt and mechanical gate, and per-response confidence scoring (High/Medium/Low) via constrained secondary LLM call. Supports LangChain, CrewAI, and AutoGen via POST /gate. All gate decisions logged to SQLite with WAL mode and threading.Lock() for concurrent write safety.
Fine-tuned BERT (99.12%) and RoBERTa (99.41%) for spam detection, paired with XGBoost scoring at R²=0.9996 across 4,137 engineered features from UK Companies House, firmographics, and DNS/SMTP signals. Deployed via FastAPI with live SHAP explanations, a Tailwind SPA, and an automated nightly retraining pipeline.
Done in collaboration with Prince Kumar Nath.
Server-authoritative D&D 5E engine powered by NVIDIA NIM, with an LLM locked to 10 registered server-side tools — inventory, HP, and room contents verified before narration. Structured output enforced via game_response() schema across narrative, HP delta, and action list. Procedurally generates a 4-floor, 20-room dungeon with 7 room types and 4 unique floor bosses. Multiplayer for up to 4 players via Socket.IO with shared AI history per room, full conversation history persisted to disk across restarts, and deployed to HuggingFace Spaces via Docker with Git LFS for 100 MB+ assets.
MSc dissertation on fake news detection using large language models and explainable AI. Fine-tuned BERT and RoBERTa, achieving 92%+ accuracy on the LIAR dataset. Applied LIME and SHAP for token-level prediction transparency and interpretability.
Engineered and analysed 280k+ crime records across geospatial, temporal, and behavioural dimensions. Built TF-IDF classification, crime severity prediction, KMeans clustering, and 74-month SARIMA forecasting pipelines, then surfaced insights through an interactive Plotly Dash dashboard.
Land Registry assessment on geospatial data handling and boundary extraction. Processed HM Land Registry boundary datasets, converting spatial data into structured GeoPackage format with automated cleaning, projection, and export workflows for GIS compatibility.
A recruiter-focused technical stack spanning machine learning, NLP, explainable AI, analytics, visualisation, and geospatial data processing.
Uvicorn
Pydantic
Over 6 years of professional experience across analytics, operations, and business development.
A balanced mix of analytical thinking, operational excellence, and stakeholder communication developed through data science projects and commercial leadership roles.
Dissertation: Enhancing Fake News Detection with Explainable AI
Project: New Start-up Proposal
Dissertation: Highlighting Important Data in Visual Analytics
Minor in Management. Certificates of Excellence for Academic Achievement.
Certificate No: TC-032026-2PN03Y7-23388
Certificate No: TC-042026-07HV84M-25960
Certificate No: TC-042026-PI5117K-26914
MSc Data Science graduate with 6+ years of professional experience delivering measurable business outcomes. Seeking junior data scientist, analytics engineer, or graduate scheme roles where I can apply transformer models, explainable AI, and end-to-end ML project experience across fintech, healthcare, legal, HR tech, or any data-driven organisation.