Open to opportunities

Sheikh
Khairul Momin
Mohammad Tahmid

MSc Data Science (Kingston University) · 92%+ accuracy on transformer NLP models · SHAP & LIME explainability · 6+ years of data-informed professional experience across analytics, operations, and business development.

LinkedIn GitHub

📍 London, UK

✉ sheikh.k.m.m.tahmid@gmail.com

📞 +44 7944 496177

Over 6 years of professional experience delivering measurable outcomes - 35% revenue growth, 40% repeat business uplift, and 12% cost reduction. MSc in Data Science from Kingston University, with independent portfolio projects spanning fintech, healthcare, legal NLP, HR tech, and crime analytics.

PythonSQLMySQLPostgreSQLMachine LearningNLPBERTRoBERTaXGBoostSHAPLIMEFastAPIDockerAWS

Highlights

Key Achievements

Quantified outcomes across machine learning, explainable AI, and commercial performance — combining production-grade technical delivery with measurable business impact.

92%+

Transformer Accuracy

BERT and RoBERTa fine-tuned on real-world NLP classification tasks

99.8%

Classifier Accuracy

TF-IDF model across 5 crime categories on 280k+ records

R²=0.9996

Regression Performance

XGBoost trained on 4,137 features for lead quality scoring

35%

Revenue Growth

Delivered through data-informed partnership and business strategy

40%

Repeat Business Uplift

Driven by 50+ analytical client presentations and insight delivery

12%

Cost Reduction

Achieved by leading offshore teams with data-driven performance metrics

Portfolio

Featured Projects

Real-world applications of data science, machine learning, and explainable AI.

NLP / Sentiment Analysis Full-Stack Intelligence Platform for Film Sentiment Divergence

CriticLens: Critic vs Audience Sentiment Divergence Engine

End-to-end NLP platform quantifying critic-vs-audience sentiment divergence across 19,836 films and 1M+ reviews via a custom divergence scoring algorithm built on Rotten Tomatoes tomatometer and audience score deltas. Fine-tuned DistilBERT on 50K IMDB reviews for binary sentiment classification, with a TF-IDF + SGD fallback for low-latency inference — both models serving live predictions cached in MySQL. Automated multi-source ingestion pipelines pull from TMDB, OMDb, The Guardian, and NYT Article Search APIs via APScheduler, with a 69-month historical backfill yielding 2,600+ critic reviews. Schema migrated from MySQL to TiDB Cloud Serverless with SSL-secured connections and smart startup sequencing. Fully containerised with Docker; FastAPI backend serves a vanilla JS frontend with film search, divergence leaderboard, sentiment timeline, and aspect-level critic breakdown.

Python FastAPI DistilBERT TiDB Cloud Docker MySQL APScheduler TF-IDF REST APIs

19.8K

Films Analysed

1M+

Reviews Indexed

Month Backfill

Live Data Sources

View on GitHub View Demo

HR Tech / NLPExplainable AI Recruitment Intelligence System

Explainable AI Hiring Intelligence Platform

Modular 12-app Django REST API with JWT auth and role-based permissions, generating 384-dim SBERT embeddings to cosine-rank 3,000+ candidates per job posting. SHAP and LIME explanations are fused 60/40 into a unified explainability report per match, with disparate impact ratio and 4/5 rule computed across gender, ethnicity, and age groups. An async Celery pipeline orchestrates CV parsing, embedding generation, and batch matching, while a trainable GradientBoosting scorer ships with zero-downtime fallback to fixed weights.

PythonDjangoDRFSBERTspaCySHAPLIMECeleryRedisPostgreSQLReactDocker

384

Embedding Dims

3K+

Candidates Ranked

60/40

SHAP/LIME Fusion

Django Apps

View on GitHub View Demo

TrialGuard Clinical Trial Dropout Prediction

Healthcare AI / Survival AnalysisClinical SaaS for Patient Retention Intelligence

TrialGuard: Clinical Trial Patient Dropout Prediction Platform

Django SaaS platform predicting clinical trial dropout up to 60 days in advance. XGBoost classifier with per-visit SHAP waterfall plots identifies at-risk patients at the individual level. Cox Proportional Hazards model delivers hazard ratios and 30/60/90-day retention probabilities, with stratified Kaplan-Meier curves segmented by risk tier (Low to Critical) via Lifelines. Cohort forecasts with 95% confidence intervals served through a DRF REST API; branded ReportLab PDFs generated per patient including SHAP feature drivers, survival curves, and clinical action logs.

PythonDjangoDRFXGBoostSHAPLifelinesCox PHReportLabMySQL

Days Ahead

Risk Tiers

95%

CI Forecasts

Time Horizons

View on GitHub View Demo

Legal AI / NLPMulti-Model Legal Intelligence System

Contract Intelligence & Power Imbalance Platform

Fine-tuned LegalBERT and Legal-RoBERTa-large ensemble classifying every clause across a 100-type unified taxonomy (CUAD, LEDGAR, MAUD - 132k clauses, macro F1 0.606). Anomaly detection fuses Isolation Forest with a 1024-dim autoencoder; bilateral power imbalance is scored −100 to +100 via sentiment, modal verb, and obligation NLP. Every prediction is explained at token level with SHAP, served through a FastAPI REST API and dark-theme dashboard with downloadable PDF reports.

PythonPyTorchLegalBERTLegal-RoBERTaSHAPFastAPIUMAPSQLite

0.606

Macro F1

100

Clause Types

132K

Training Clauses

±100

Imbalance Scale

View on GitHub View Demo

Contract Intelligence and Power Imbalance Analysis Platform

FINTECH / AIProduction AI Risk Platform

AI Revenue Leakage Detection Platform

Production contract intelligence platform with XGBoost/Logistic Regression ensemble (70/30) scoring invoice leakage probability, Isolation Forest anomaly detection per record, and Prophet forecasting with 90% CI bands over a 24-month revenue horizon. SHAP attribution pipeline exposes top risk drivers per invoice; a 7-rule leakage engine generates 13k+ alerts covering missing payments, underbilling, and duplicates. Predictions served via DRF API with a Chart.js dashboard and per-invoice drill-down modals.

PythonDjangoXGBoostProphetSHAPIsolation ForestMySQLChart.js

70/30

Ensemble Split

13k+

Leakage Alerts

24mo

Forecast Horizon

Leakage Rules

View on GitHub View Demo

CLOUD / AIProduction AWS Intelligence Platform

CostRadar: AWS Spend Anomaly Detection and Rightsizing Platform

Production AWS cost intelligence platform ingesting 90-day per-service spend via Cost Explorer API into DynamoDB. Dual anomaly detection using Isolation Forest (contamination=0.05) and Z-score with consensus scoring and plain-English explanations. Prophet forecasting with 30-day horizon, 95% CI, MAPE validation, and budget breach day prediction. EC2 rightsizing via 14-day CloudWatch CPU/memory analysis. EventBridge-scheduled daily Lambda pipeline with SNS alerts, containerised via Docker and deployed to Hugging Face Spaces.

PythonFastAPIProphetscikit-learnboto3DynamoDBLambdaSNSReactRechartsDocker

90d

Cost History

Dual

Anomaly Models

30d

Forecast Horizon

Live

AWS Data

View on GitHub View Demo

FINTECH / AIProduction AI Risk Platform

AI Portfolio Stress Testing Platform

Production risk platform with GMM regime classification, Ledoit-Wolf shrinkage, and XGBoost/ElasticNet trained on 50+ macro features, delivering Sharpe 4.42, VaR -12.07%, and CVaR -17.48%. Stress-tested 20+ historical crises with SHAP attribution and plain-English narratives, plus reverse stress-testing for threshold-based shock back-solving. FastAPI dashboard refreshes 20 live instruments hourly.

PythonFastAPIXGBoostElasticNetSHAPGMMSciPyyfinanceChart.js

4.42

Sharpe Ratio

-12.07%

VaR

20+

Crises Stress-Tested

50+

Macro Features

View on GitHub View Demo

AI / LLMOpsDeclarative LLM Agent Action Firewall

PolicyGate: AI Agent Action Firewall

Deployed a framework-agnostic LLM agent firewall with a YAML policy DSL (6 operators, hot-reload on file mtime), dual-layer safety enforcement via system prompt and mechanical gate, and per-response confidence scoring (High/Medium/Low) via constrained secondary LLM call. Supports LangChain, CrewAI, and AutoGen via POST /gate. All gate decisions logged to SQLite with WAL mode and threading.Lock() for concurrent write safety.

PythonFastAPIPydanticOpenAI SDKOpenRouterPyYAMLSQLiteChart.jsDockerHuggingFace Spaces

Frameworks Supported

Policy Operators

Pytest Tests

Live

HF Spaces Demo

View on GitHub View Demo

AI / MLNLP-Powered Lead Scoring & Spam Classification

AI Lead Quality Scoring & Spam Detection

Fine-tuned BERT (99.12%) and RoBERTa (99.41%) for spam detection, paired with XGBoost scoring at R²=0.9996 across 4,137 engineered features from UK Companies House, firmographics, and DNS/SMTP signals. Deployed via FastAPI with live SHAP explanations, a Tailwind SPA, and an automated nightly retraining pipeline.

Done in collaboration with Prince Kumar Nath.

PythonPyTorchHugging FaceBERTRoBERTaXGBoostFastAPISHAPSQLite

99.41%

Spam Detection

0.9996

Lead Score R²

30K

Leads Engineered

4,137

Features Built

View on GitHub

NLP / XAINLP-Powered Spam Detection & XAI

Spam Lens: Explainable Spam Classifier with Multi-Method XAI

Dual-stream BiLSTM with a custom Attention layer blends token embeddings with 5 linguistic signals (URL, phone, currency, urgency ratio, length norm); Attention, LIME, and SHAP are fused via per-method normalisation into a single word-importance ranking, plain-English verdict, and live per-token colour heatmap. Deployed on FastAPI with concurrent XAI thread pools, SQLite history, and a live analytics dashboard — retraining triggers automatically on error rate, staleness, or volume thresholds, with accuracy-gated rollback.

PythonTensorFlowKerasBiLSTMLIMESHAPFastAPISQLAlchemyVanilla JSDocker

BiLSTM

Dual-Stream Model

3-Way

XAI Fusion

Linguistic Features

Live

Attention Heatmap

View on GitHub View Demo

Game AI / BackendGrounded AI Game Master Engine

Ironwood Dungeon: AI Game Master

Server-authoritative D&D 5E engine powered by NVIDIA NIM, with an LLM locked to 10 registered server-side tools — inventory, HP, and room contents verified before narration. Structured output enforced via game_response() schema across narrative, HP delta, and action list. Procedurally generates a 4-floor, 20-room dungeon with 7 room types and 4 unique floor bosses. Multiplayer for up to 4 players via Socket.IO with shared AI history per room, full conversation history persisted to disk across restarts, and deployed to HuggingFace Spaces via Docker with Git LFS for 100 MB+ assets.

PythonFlaskFlask-SocketIONVIDIA NIMOpenAI SDKJavaScriptDockerHuggingFace Spaces

Server Tools

Rooms / Run

Floor Bosses

Multiplayer

View on GitHub View Demo

FEATUREDMSc Dissertation

Fake News Detection

MSc dissertation on fake news detection using large language models and explainable AI. Fine-tuned BERT and RoBERTa, achieving 92%+ accuracy on the LIAR dataset. Applied LIME and SHAP for token-level prediction transparency and interpretability.

PythonPyTorchBERTRoBERTaLIMESHAPNLP

92%

Model Accuracy

Datasets Integrated

XAI Techniques

View on GitHub Read Thesis Paper

ANALYTICSEnd-to-End Analytics Project

Montgomery County Crime Analytics

Engineered and analysed 280k+ crime records across geospatial, temporal, and behavioural dimensions. Built TF-IDF classification, crime severity prediction, KMeans clustering, and 74-month SARIMA forecasting pipelines, then surfaced insights through an interactive Plotly Dash dashboard.

PythonPandasScikit-learnPlotlyDashNLPSARIMA

280k+

Records Cleaned

99.8%

Classifier Accuracy

0.998

Prediction Accuracy

Clusters Identified

View on GitHub

ASSESSMENTGeospatial Data Project

HMLR Boundary Extraction

Land Registry assessment on geospatial data handling and boundary extraction. Processed HM Land Registry boundary datasets, converting spatial data into structured GeoPackage format with automated cleaning, projection, and export workflows for GIS compatibility.

PythonGeoPandasShapelyFionaGISGeoPackage

View on GitHub

Technical Expertise

Skills & Technologies

A recruiter-focused technical stack spanning machine learning, NLP, explainable AI, analytics, visualisation, and geospatial data processing.

⌨️Programming Languages

Python SQL MATLAB

Java

C++

HTML

R Programming

JavaScript

React

Tailwind CSS

🤖Data Science & AI

Machine Learning

Deep Learning NLP BERT LegalBERT RoBERTa Legal-RoBERTa SHAP LIME Time Series SARIMA TF-IDF KMeans

Pandas

NumPy

scikit-learn

PyTorch XGBoost Hugging Face Transformers

Matplotlib Seaborn SciPy SBERT Isolation Forest Prophet Gradient Boosting Survival Analysis Lifelines spaCy

📊Data Analytics & Visualisation

Tableau Excel

Jupyter Notebook

Plotly Dash Dashboarding Quantitative Research

🗺️Geospatial Analysis

GeoPandas Shapely Fiona GDAL/OGR GIS GeoPackage Mapbox

🔧Tools & Cloud

Git

GitHub

AWS

FastAPI

SQLite

Uvicorn Pydantic joblib

joblib

Chart.js Jinja

Django

DRF

PostgreSQL

MySQL

Redis Celery

Docker JWT

Vite ReportLab

Career

Professional Experience

Over 6 years of professional experience across analytics, operations, and business development.

Business Development Manager

Ideal PCO Licence, London, UK

Mar 2023 – Jan 2025

◆Defined and executed business goals using data-driven strategies to meet key performance targets.
◆Leveraged data analysis across Python, SQL, and Excel to track industry patterns, supporting stronger forecasting and commercial decision-making.
◆Applied data analysis techniques to track industry patterns, contributing to 15% revenue growth.
◆Built high-value partnerships through data-informed strategy, contributing to 35% annual revenue growth.
◆Delivered 50+ client presentations supported by analytical insights, driving 40% more repeat business.
◆Secured high-value partnerships through data-informed strategies, increasing annual revenue by 35%.

Senior Associate

Quantanite (formerly Taskeater), Dhaka, Bangladesh

Jul 2021 – Aug 2022

◆Tracked team tasks with performance data, achieving 99.8% on-time delivery and 98%+ quality.
◆Analysed team KPIs to identify process inefficiencies and implement data-informed improvements.
◆Used performance metrics to design and deliver training, reducing new hire ramp-up time by 20%.
◆Managed daily, weekly, and monthly client reporting cycles using performance data, achieving 100% on-time delivery and 98%+ quality.
◆Evaluated outputs to identify quality gaps, reducing error rates by 15% through data-based performance improvements.
◆Recognised as Best Employee of the Year for data-driven leadership and consistent operational excellence.

Associate

Quantanite (formerly Taskeater), Dhaka, Bangladesh

Jun 2017 – Jul 2021

◆Processed and validated 5K+ weekly fashion industry datasets with high accuracy and timeliness.
◆Collaborated with lead generation teams, analysing data trends to boost qualified leads by 15%.
◆Monitored data categorisation and team efficiency, partnering with QA to keep error rates under 4% through compliance tracking.
◆Partnered with QA to keep error rates under 4% via data quality checks and compliance tracking.
◆Applied data analysis skills to identify trends, improve reporting accuracy, and strengthen client satisfaction.

Analyst

Quantanite (formerly Taskeater), Dhaka, Bangladesh

Dec 2016 – Jun 2017

◆Career progression built on strong data accuracy, quality control discipline, and dependable delivery in fast-paced operations.
◆Recognised internally for consistent quality standards, operational reliability, and readiness for promotion into broader analytical responsibilities.
◆Established the performance foundation that later supported KPI reporting, training design, and cross-functional process improvement.

Professional Strengths

Core Capabilities

A balanced mix of analytical thinking, operational excellence, and stakeholder communication developed through data science projects and commercial leadership roles.

🎯Analytical & Strategic

◆Data-Driven Decision Making

◆Analytical Thinking

◆Critical Thinking

◆Portfolio Optimisation

◆Continuous Improvement Mindset

◆Change Management

⚙️Operational & Performance

◆Performance Optimisation

◆Performance Management

◆Process Improvement

◆Operational Efficiency

◆KPI Tracking

◆Time Management

🤝Communication & Collaboration

◆Cross-Functional Collaboration

◆Client Relationship Management

◆Mentoring & Training

◆Data Visualisation & Reporting

Academic Background

Education

🎓

MSc Data Science

Kingston University, London

Jan 2025 – Mar 2026

Dissertation: Enhancing Fake News Detection with Explainable AI

🎓

MBA Business Administration

Cardiff Metropolitan University

Jun 2015 – Jul 2016

Project: New Start-up Proposal

🎓

MSc Electronics & Computer Engineering

University of Birmingham

Sep 2013 – Oct 2014

Dissertation: Highlighting Important Data in Visual Analytics

🎓

BSc Computer Engineering

Abu Dhabi University

Jan 2009 – Jun 2013

Minor in Management. Certificates of Excellence for Academic Achievement.

Professional Development

Certifications

🏆

AWS Cloud Bootcamp

ThinkCloudly

Mar 2026 • 5 CPE Hours

Certificate No: TC-032026-2PN03Y7-23388

🏆

Security Operation Center Bootcamp

ThinkCloudly

Apr 2026 • 5 CPE Hours

Certificate No: TC-042026-07HV84M-25960

🏆

IT Auditing & GRC Bootcamp

ThinkCloudly

Apr 2026 • 5 CPE Hours

Certificate No: TC-042026-PI5117K-26914

Freelance

Web Work

Alongside data science, I build production websites for NGOs, businesses, and community organisations — handling design, development, and deployment end to end.

Freelance ● Live NGO Website — Patiya, Chattogram, Bangladesh

Unique Foundation

Full multi-page website with a custom admin dashboard — the client can edit all page content, add & remove gallery images, manage social media links, and publish new pages with no code. Includes transparent logo processing, category-filtered gallery with lightbox, contact form, and Google Maps embed. Fully responsive and live.

HTML / CSS / JSAdmin DashboardCMSResponsive DesignNGOFreelance

View Live Site

Get In Touch

Let's Connect

MSc Data Science graduate with 6+ years of professional experience delivering measurable business outcomes. Seeking junior data scientist, analytics engineer, or graduate scheme roles where I can apply transformer models, explainable AI, and end-to-end ML project experience across fintech, healthcare, legal, HR tech, or any data-driven organisation.

✉ Get in Touch 📞 Let's Talk

📍 London, UK ✉ sheikh.k.m.m.tahmid@gmail.com 📞 +44 7944 496177

Open to: Junior Data Scientist · Data Analyst · Analytics Engineer · ML Engineer (Graduate) · Data Science Graduate Scheme

SheikhKhairul MominMohammad Tahmid

Key Achievements

Featured Projects

CriticLens: Critic vs Audience Sentiment Divergence Engine

Explainable AI Hiring Intelligence Platform

TrialGuard: Clinical Trial Patient Dropout Prediction Platform

Contract Intelligence & Power Imbalance Platform

AI Revenue Leakage Detection Platform

CostRadar: AWS Spend Anomaly Detection and Rightsizing Platform

AI Portfolio Stress Testing Platform

PolicyGate: AI Agent Action Firewall

AI Lead Quality Scoring & Spam Detection

Spam Lens: Explainable Spam Classifier with Multi-Method XAI

Ironwood Dungeon: AI Game Master

Fake News Detection

Montgomery County Crime Analytics

HMLR Boundary Extraction

Skills & Technologies

Professional Experience

Core Capabilities

Education

MSc Data Science

MBA Business Administration

MSc Electronics & Computer Engineering

BSc Computer Engineering

Certifications

AWS Cloud Bootcamp

Security Operation Center Bootcamp

IT Auditing & GRC Bootcamp

Web Work

Unique Foundation

Let's Connect

Sheikh
Khairul Momin
Mohammad Tahmid