Predictive Modeling at Scale — 500bps Accuracy Gains and a 20x Speed-Up

2022–2024 · ML Engineering

Legacy ML pipelines accumulate debt quietly. Models that were state-of-the-art two years ago become bottlenecks — slow to retrain, hard to audit, resistant to change. This work was about diagnosing what was broken, fixing it systematically, and leaving something the next team could maintain.

Several client pipelines were running GLM-era models that had grown brittle — slow to retrain, poorly documented, and underperforming against more recent benchmarks. The business impact was real: suboptimal predictions in credit risk scoring and customer segmentation were costing margin.

I led a structured revamp across multiple client engagements, moving from legacy GLMs to modern GBM approaches (XGBoost and LightGBM) while rebuilding the underlying pipeline infrastructure in parallel. Model redesign rearchitected feature engineering and model selection, improving accuracy by over 500 basis points on held-out test sets. Pipeline performance was improved by refactoring data processing and inference layers, achieving more than a 20x speed improvement end-to-end. NLP components were introduced to unlock unstructured data signals that prior models couldn't consume. I directed offshore and cross-functional teams across analytics and engineering throughout.

The constraint was continuity — clients couldn't afford a big-bang migration. Changes had to be incremental, each step demonstrably better than the last, with rollback options intact. That meant a lot of parallel running, careful A/B validation, and stakeholder communication to keep confidence high while the transition was ongoing.

Accuracy improvements of 500+ bps and a 20x pipeline speed-up, validated on production data. More importantly, the pipelines were left in a state where client teams could own and extend them independently.

Skills: XGBoost · LightGBM · GLM · NLP · Python · Pipeline Engineering · Offshore Team Leadership