Back to Projects

ML-Powered Customer Churn Prediction System

Built ML-powered churn prediction system improving AUC from 0.71 → 0.84 and enabling retention decisions

Type Classification · ML System
Stack Python · XGBoost · SMOTE · Looker Studio
Impact AUC 0.84 · Top-decile lift 22%

Product Context

The company was experiencing rising customer churn with no system to identify at-risk users before they left. Retention campaigns were applied broadly — wasting budget on low-risk users while missing the highest-risk segments entirely.

The business needed a predictive system that could identify churn-prone users early and enable targeted, cost-effective interventions rather than blanket campaigns.

Data & Pipeline Design

Built an end-to-end pipeline from raw transactional and behavioral data sources to a prediction layer that feeds directly into the retention workflow.

CRM Data
User profiles
Transaction Logs
Behavioral signals
Feature Engineering
Python / pandas
XGBoost Model
Prediction layer
Risk Scores
Per-user output
Looker Studio
Dashboard

Modeling Approach

The core challenge was severe class imbalance — churned users represented less than 12% of the dataset. A naive model would achieve high accuracy by simply predicting "no churn" for everyone.

Algorithm XGBoost classifier with hyperparameter tuning via cross-validated grid search
Imbalance Fix SMOTE (Synthetic Minority Oversampling) applied during training to balance class representation
Features Engineered 30+ behavioral features: recency, frequency, monetary (RFM), session patterns, support ticket frequency, feature adoption velocity
Optimization Optimized for recall in high-risk segments — better to flag a false positive than miss a true churner
Evaluation AUC-ROC, precision-recall curves, decile analysis, feature importance (SHAP values)

Dashboard & Visualization

The model outputs feed into a Looker Studio dashboard providing the retention team with actionable views: risk distribution heatmaps, segment breakdowns, and per-user churn probability scores with contributing factors.

Total Users Scored
12,847
High Risk (>0.7)
1,284
Retention Savings
$124K
Churn Risk Distribution by Decile
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
↑ Top decile (D10) captures 22% of all churners — primary target for retention campaigns

Interactive Dashboard

This dashboard tracks marketing performance across campaigns, including impressions, clicks, conversions, and ROAS. It helps identify high-performing channels and optimize spend based on actual conversion impact.

Marketing Analytics Dashboard Preview
View Interactive Dashboard →

Key Findings

  • High-risk churn users concentrated in users with declining login frequency over 30 days combined with reduced feature adoption
  • Users who submitted 2+ support tickets in their first month had 3.2× higher churn probability
  • The top 3 predictive features were: days since last login, session-to-purchase ratio, and support ticket count
  • Segment analysis revealed that mid-tier subscription users (not the cheapest or most expensive) had the highest churn rate

How insights drove action

The model's risk scores were integrated into the retention team's workflow. Instead of applying blanket campaigns, the team focused retention efforts on the top 10% highest-risk users, delivering personalized outreach based on the top contributing factors identified by SHAP analysis.

This shifted the approach from reactive (post-churn surveys) to proactive, prediction-driven retention. High-risk users identified in decile 10 received priority onboarding support and personalized feature guidance.

Business Impact

AUC Improvement
0.71 → 0.84
+18% predictive power
Top-Decile Lift
22%
Churn capture rate
Intervention Cost
↓ 40%
Targeted vs. blanket campaigns

Reflections

"The biggest lesson wasn't about model accuracy — it was about making predictions actionable. A model that scores every user means nothing if the retention team can't act on it. The real work was designing the output layer: clear risk tiers, interpretable reasons, and integration into existing workflows. The best ML system is one the business actually uses."