End-to-End Project Lifecycle
Business Case
DS/ML Solution
Production
My Role: Driving the Initiative End-to-End
I Drove...
The initial data discovery and EDA to validate project feasibility.
The entire feature engineering process, from aggregations to network features.
I Owned...
The business case and definition of success metrics (KPIs).
The final model selection and the A/B test analysis and recommendation.
I Led...
The overall DS/ML project strategy from conception to post-launch.
The critical data mapping process and the design of production monitoring.
The Business Case: A Three-Fold Opportunity
Cost Savings
Replace a 3rd-party vendor contract costing €XM annually.
Performance Uplift
Beat the vendor's black-box model to reduce chargebacks & customer friction.
Agility & Speed
Gain the ability to rapidly retrain and deploy, adapting to new fraud patterns in days, not months.
Measuring Success: KPIs and Financial Impact
ML Performance KPIs
Precision
Recall
F1-Score
Financial Impact Framework
The Bottom Line
The Total Impact provides the full business picture by combining the model's operational gains with the €XM annual contract savings.
*The 11% 3DS drop rate, used to calculate friction cost, was determined from prior, isolated A/B tests on challenged transactions.
Project Acceptance Criteria
Positive operational impact alone. Clear Go.
Marginal operational impact. Still a Go.
Positive only after including contract savings. Go, but needs monitoring.
Negative total financial impact. No-Go.
EDA: Key Insights That Shaped Our Strategy
Pareto Principle in Fraud & GMV
75% of chargebacks came from just 3 countries, while ~92% of GMV came from the top 8 countries.
Implication: We needed to pay close attention to model performance in these key segments, not just the global average.
Fraudulent Order Value
The average order value for transactions that resulted in a chargeback was significantly higher than for legitimate orders.
Implication: This confirmed that `amount` would be a powerful feature and that focusing on high-value transactions was critical.
Chargeback Maturity Time
Waiting 90 days for chargeback data to mature captured 91% of all final chargebacks.
Implication: I decided this was the optimal trade-off between data completeness and speed, allowing us to retrain models quarterly without waiting the full 180 days.
Performance Segmentation
Based on these insights, I established a process to monitor performance across crucial segments.
Implication: We tracked F1, Precision, and Recall for top countries and for new vs. existing users to ensure the model was fair and effective for everyone.
Feature Engineering at Scale
We engineered a rich set of features from ~100 raw data points per order.
Activity & User
- Order/Card Counts
- Amount (Sum/Avg)
- Account/Card Age
- Payment Type
- Domestic/Foreign Card
Time Horizon Aggs
- Aggregates over 3h, 1d, 7d, 30d, 90d
- User, Device & Phone level
Graph & Network
- Network Size
- Associated Device Counts
- Associated Card Counts
- Associated Email Counts
Advanced Behavioral
- Email Domain
- Phone Model, OS
- Haversine Distance
- Order Time/Day Patterns
- Refund Count Rate
- Cash Unpaid Rate
DS/ML Solution: Validation & Modeling
Overall Validation Strategy
Training Data
Nov - Mar
Validation Set
Apr
Test Set
May
CHB Maturation
Jun - Aug
A strict time-based split was crucial to prevent data leakage and accurately simulate how the model would perform on future, unseen data.
Algorithms Explored
Benchmarked several gradient boosting models, but ruled out Deep Learning networks due to strict real-time latency requirements.
LightGBM provided the best balance of prediction performance and low-latency inference speed for our production environment.
DS/ML Solution: Optimization & Interpretability
Training & Optimization Workflow
Undersampled the majority class in the training set (testing 1%, 5%, 10%, 20% ratios). This was chosen for faster training iterations with minimal performance trade-off compared to oversampling.
Ran 5-Fold Stratified CV on the sampled data. Used Optuna to optimize for PR-AUC, an ideal metric for imbalanced classification as it is not sensitive to a single decision threshold.
Trained the final LightGBM model on the full, undersampled training data using the best hyperparameters discovered by Optuna, and most important features as per SHAP.
Used the trained model to predict on the full validation set. Optimized decision thresholds to maximize financial impact for key segments (global, country, new/existing users).
Applied the chosen thresholds to the Test Set to verify performance generalization over time before deployment.
Model Interpretability
To build trust and understand model behavior, I used two key techniques:
My Path to Production: From Silo to System
Click on each stage to see the details of my work.
Impact & Results: A Clear Success (Green)
Additional CHBs Saved
€XX K
annually
Additional GMV Freed
€YY M
annually from 3DS
Production F1-Score
0%
stable in production
Learnings & Future Work
Key Learnings
- Business Context is King: The best features came from understanding fraudster behavior, not just algorithms.
- Infrastructure First: Investing in a feature store early was a force multiplier that prevented major issues.
- Collaboration is Non-Negotiable: Success depended on the tight loop between Data Science, Engineering, and Business.
What I Would Do Differently
- Include LTV loss from FP in Financial Impact: False Positives mean genuine customers are facing unnecessary friction, which might turn them away from platform
- Explore Graph Neural Networks (GNNs): For a future iteration, I would explore using GNNs to more directly model the complex relationships between users, cards, and devices, potentially capturing sophisticated fraud rings that are harder to detect with traditional feature engineering.
Thank You
Q&A