ZRO Fox Analysis: Predicting Customer Purchase Propensity
Objective
The goal of this project was to develop a predictive model to determine the propensity of visitors to a client website to make a purchase. This analysis will help inform targeted marketing strategies and optimize campaign performance by identifying high-propensity customers.
Data Preparation
We began by analyzing web activity data, focusing on key metrics such as Recency (days since the last visit), Frequency (number of visits), and Session Duration. Purchases were identified through page titles containing "Thank%20you."
Feature Engineering
To enhance the predictive power of our model, we engineered additional features:
- **Hour of Visit**: Extracted from the timestamp to capture the time of day.
- **Day of the Week**: Extracted from the timestamp to capture weekly patterns.
- **Session Number**: To identify returning users.
- **Device Type**: Identified mobile users from the user agent string.
- **Referral Source**: Indicated whether the visit was referred from another site.
#### Modeling Approach
1. **Initial Model**: A Gradient Boosting Classifier was trained on the initial features. While the model performed well for non-purchase events, it struggled with identifying purchase events due to class imbalance.
2. **Feature Expansion**: Additional features such as the hour of visit, day of the week, device type, and referral source were added, improving the model's performance.
3. **Class Imbalance Handling**: We used a combination of oversampling the minority class (purchases) and undersampling the majority class (non-purchases) to balance the dataset. This significantly enhanced the model's ability to predict purchase events.
Model Performance
- **Accuracy**: The balanced Gradient Boosting Classifier achieved an overall accuracy of 89%.
- **Precision, Recall, and F1-Score**: For purchase events, the model achieved a precision of 47%, recall of 11%, and F1-score of 18%. For non-purchase events, the model achieved a precision of 88%, recall of 90%, and F1-score of 89%.
- **ROC AUC Score**: The model's ROC AUC score improved to 0.96, indicating excellent predictive power.
Key Insights
1. **Feature Importance**: Recency, session duration, and the hour of visit were identified as the most influential features in predicting purchases.
2. **High-Propensity Customers**: The distribution of predicted propensity scores allowed us to identify high-propensity customers, aiding in targeted marketing efforts.
3. **Customer Segmentation**: Clustering based on propensity scores and other features revealed distinct customer segments, enabling more personalized marketing strategies.
Visualizations
1. **Feature Importance**: Highlighted the key factors influencing purchase decisions.
2. **ROC Curve**: Demonstrated the model's ability to distinguish between purchasers and non-purchasers.
3. **Confusion Matrix**: Provided a detailed breakdown of model predictions, showing accuracy and types of errors.
4. **Propensity Score Distribution**: Showed the likelihood of customers making a purchase.
5. **Customer Segmentation**: Visualized clusters of customers based on their behavior and propensity scores.
Conclusion
The predictive model developed through this process offers valuable insights into customer behavior, helping to identify high-propensity customers and informing targeted marketing strategies. By focusing marketing efforts on these high-propensity segments, we can optimize campaign performance and drive higher conversion rates.
If further refinement or additional features are needed, the model can be iteratively improved to continue providing actionable insights for marketing initiatives.
Here are some visualizations and outputs that can be useful for a sales or marketing person to understand the model and make informed decisions:
1. Feature Importance
Identify which features are the most influential in predicting purchases. This can help prioritize marketing efforts based on the most significant factors.
2. ROC Curve
Visualize the trade-off between the true positive rate and false positive rate at various threshold settings. This can help in understanding the model's performance.
3. Confusion Matrix
Provide a detailed breakdown of the model's predictions compared to the actual outcomes. This can help in understanding the model's accuracy and error types.
4. Predicted Propensity Scores
Show the distribution of predicted propensity scores for making a purchase. This can help target high-propensity customers more effectively.
5. Customer Segmentation
Cluster customers based on their propensity scores and other features. This can help in creating targeted marketing strategies for different customer segments.
Let's generate these visualizations and outputs
Feature Importance
First, let's look at the feature importance from the Gradient Boosting model.
Feature Importance
The feature importance plot above shows the most influential features in predicting purchases. For example, "RECENCY," "SESSION_DURATION," and "HOUR" are some of the key features impacting the prediction.
ROC Curve
Let's now generate the ROC curve to visualize the trade-off between the true positive rate and false positive rate.
ROC Curve
The ROC curve above shows the model's ability to distinguish between purchasers and non-purchasers. The area under the curve (AUC) is high, indicating good model performance.
Confusion Matrix
Next, let's generate a confusion matrix to provide a detailed breakdown of the model's predictions compared to the actual outcomes.
Confusion Matrix
The confusion matrix provides a detailed breakdown of the model's predictions. It shows the number of true positives, true negatives, false positives, and false negatives, helping you understand the types of errors the model makes.
Predicted Propensity Scores
Let's visualize the distribution of predicted propensity scores for making a purchase. This can help in targeting high-propensity customers.
Predicted Propensity Scores
The distribution of predicted propensity scores provides insight into how the model rates the likelihood of making a purchase for different users. High-propensity scores can help you identify and target users who are more likely to make a purchase.
Customer Segmentation
Finally, let's cluster customers based on their propensity scores and other features to create targeted marketing strategies for different segments.
Customer Segmentation
The scatter plot shows the clustering of customers based on their propensity scores and features such as recency and session duration. Each color represents a different cluster, helping to visualize distinct customer segments.
### Summary of Visualizations and Outputs
1. **Feature Importance**: Shows the most influential features in predicting purchases.
2. **ROC Curve**: Visualizes the model's performance in distinguishing between purchasers and non-purchasers.
3. **Confusion Matrix**: Provides a detailed breakdown of the model's predictions compared to actual outcomes.
4. **Predicted Propensity Scores**: Displays the distribution of propensity scores, helping to identify high-propensity customers.
5. **Customer Segmentation**: Clusters customers based on their propensity scores and features, aiding in targeted marketing strategies.
These visualizations and outputs can help you as a sales or marketing person to understand the model's behavior, target high-propensity customers, and create effective marketing strategies.
0 Comments