MSc Data Science (Kingston University) · 92%+ accuracy on transformer NLP models · SHAP & LIME explainability · 6+ years of data-informed professional experience across analytics, operations, and business development.
Over 6 years of professional experience delivering measurable outcomes - 35% revenue growth, 40% repeat business uplift, and 12% cost reduction. MSc in Data Science from Kingston University, with independent portfolio projects spanning fintech, healthcare, legal NLP, HR tech, and crime analytics.
Quantified outcomes across machine learning, explainable AI, and commercial performance — combining production-grade technical delivery with measurable business impact.
Real-world applications of data science, machine learning, and explainable AI.
End-to-end NLP platform quantifying critic-vs-audience sentiment divergence across 19,836 films and 1M+ reviews via a custom divergence scoring algorithm built on Rotten Tomatoes tomatometer and audience score deltas. Fine-tuned DistilBERT on 50K IMDB reviews for binary sentiment classification, with a TF-IDF + SGD fallback for low-latency inference — both models serving live predictions cached in MySQL. Automated multi-source ingestion pipelines pull from TMDB, OMDb, The Guardian, and NYT Article Search APIs via APScheduler, with a 69-month historical backfill yielding 2,600+ critic reviews. Schema migrated from MySQL to TiDB Cloud Serverless with SSL-secured connections and smart startup sequencing. Fully containerised with Docker; FastAPI backend serves a vanilla JS frontend with film search, divergence leaderboard, sentiment timeline, and aspect-level critic breakdown.
Modular 12-app Django REST API with JWT auth and role-based permissions, generating 384-dim SBERT embeddings to cosine-rank 3,000+ candidates per job posting. SHAP and LIME explanations are fused 60/40 into a unified explainability report per match, with disparate impact ratio and 4/5 rule computed across gender, ethnicity, and age groups. An async Celery pipeline orchestrates CV parsing, embedding generation, and batch matching, while a trainable GradientBoosting scorer ships with zero-downtime fallback to fixed weights.
Django SaaS platform predicting clinical trial dropout up to 60 days in advance. XGBoost classifier with per-visit SHAP waterfall plots identifies at-risk patients at the individual level. Cox Proportional Hazards model delivers hazard ratios and 30/60/90-day retention probabilities, with stratified Kaplan-Meier curves segmented by risk tier (Low to Critical) via Lifelines. Cohort forecasts with 95% confidence intervals served through a DRF REST API; branded ReportLab PDFs generated per patient including SHAP feature drivers, survival curves, and clinical action logs.
Fine-tuned LegalBERT and Legal-RoBERTa-large ensemble classifying every clause across a 100-type unified taxonomy (CUAD, LEDGAR, MAUD - 132k clauses, macro F1 0.606). Anomaly detection fuses Isolation Forest with a 1024-dim autoencoder; bilateral power imbalance is scored −100 to +100 via sentiment, modal verb, and obligation NLP. Every prediction is explained at token level with SHAP, served through a FastAPI REST API and dark-theme dashboard with downloadable PDF reports.
Production contract intelligence platform with XGBoost/Logistic Regression ensemble (70/30) scoring invoice leakage probability, Isolation Forest anomaly detection per record, and Prophet forecasting with 90% CI bands over a 24-month revenue horizon. SHAP attribution pipeline exposes top risk drivers per invoice; a 7-rule leakage engine generates 13k+ alerts covering missing payments, underbilling, and duplicates. Predictions served via DRF API with a Chart.js dashboard and per-invoice drill-down modals.
Production AWS cost intelligence platform ingesting 90-day per-service spend via Cost Explorer API into DynamoDB. Dual anomaly detection using Isolation Forest (contamination=0.05) and Z-score with consensus scoring and plain-English explanations. Prophet forecasting with 30-day horizon, 95% CI, MAPE validation, and budget breach day prediction. EC2 rightsizing via 14-day CloudWatch CPU/memory analysis. EventBridge-scheduled daily Lambda pipeline with SNS alerts, containerised via Docker and deployed to Hugging Face Spaces.
Production risk platform with GMM regime classification, Ledoit-Wolf shrinkage, and XGBoost/ElasticNet trained on 50+ macro features, delivering Sharpe 4.42, VaR -12.07%, and CVaR -17.48%. Stress-tested 20+ historical crises with SHAP attribution and plain-English narratives, plus reverse stress-testing for threshold-based shock back-solving. FastAPI dashboard refreshes 20 live instruments hourly.
Deployed a framework-agnostic LLM agent firewall with a YAML policy DSL (6 operators, hot-reload on file mtime), dual-layer safety enforcement via system prompt and mechanical gate, and per-response confidence scoring (High/Medium/Low) via constrained secondary LLM call. Supports LangChain, CrewAI, and AutoGen via POST /gate. All gate decisions logged to SQLite with WAL mode and threading.Lock() for concurrent write safety.
Fine-tuned BERT (99.12%) and RoBERTa (99.41%) for spam detection, paired with XGBoost scoring at R²=0.9996 across 4,137 engineered features from UK Companies House, firmographics, and DNS/SMTP signals. Deployed via FastAPI with live SHAP explanations, a Tailwind SPA, and an automated nightly retraining pipeline.
Done in collaboration with Prince Kumar Nath.
Dual-stream BiLSTM with a custom Attention layer blends token embeddings with 5 linguistic signals (URL, phone, currency, urgency ratio, length norm); Attention, LIME, and SHAP are fused via per-method normalisation into a single word-importance ranking, plain-English verdict, and live per-token colour heatmap. Deployed on FastAPI with concurrent XAI thread pools, SQLite history, and a live analytics dashboard — retraining triggers automatically on error rate, staleness, or volume thresholds, with accuracy-gated rollback.
Server-authoritative D&D 5E engine powered by NVIDIA NIM, with an LLM locked to 10 registered server-side tools — inventory, HP, and room contents verified before narration. Structured output enforced via game_response() schema across narrative, HP delta, and action list. Procedurally generates a 4-floor, 20-room dungeon with 7 room types and 4 unique floor bosses. Multiplayer for up to 4 players via Socket.IO with shared AI history per room, full conversation history persisted to disk across restarts, and deployed to HuggingFace Spaces via Docker with Git LFS for 100 MB+ assets.
MSc dissertation on fake news detection using large language models and explainable AI. Fine-tuned BERT and RoBERTa, achieving 92%+ accuracy on the LIAR dataset. Applied LIME and SHAP for token-level prediction transparency and interpretability.
Engineered and analysed 280k+ crime records across geospatial, temporal, and behavioural dimensions. Built TF-IDF classification, crime severity prediction, KMeans clustering, and 74-month SARIMA forecasting pipelines, then surfaced insights through an interactive Plotly Dash dashboard.
Land Registry assessment on geospatial data handling and boundary extraction. Processed HM Land Registry boundary datasets, converting spatial data into structured GeoPackage format with automated cleaning, projection, and export workflows for GIS compatibility.
A recruiter-focused technical stack spanning machine learning, NLP, explainable AI, analytics, visualisation, and geospatial data processing.
Uvicorn
Pydantic
Over 6 years of professional experience across analytics, operations, and business development.
A balanced mix of analytical thinking, operational excellence, and stakeholder communication developed through data science projects and commercial leadership roles.
Dissertation: Enhancing Fake News Detection with Explainable AI
Project: New Start-up Proposal
Dissertation: Highlighting Important Data in Visual Analytics
Minor in Management. Certificates of Excellence for Academic Achievement.
Certificate No: TC-032026-2PN03Y7-23388
Certificate No: TC-042026-07HV84M-25960
Certificate No: TC-042026-PI5117K-26914
Freelance
Alongside data science, I build production websites for NGOs, businesses, and community organisations — handling design, development, and deployment end to end.
MSc Data Science graduate with 6+ years of professional experience delivering measurable business outcomes. Seeking junior data scientist, analytics engineer, or graduate scheme roles where I can apply transformer models, explainable AI, and end-to-end ML project experience across fintech, healthcare, legal, HR tech, or any data-driven organisation.