4.4 Changing probability cutoff

Note that caret is using a probability cutoff of 0.5 to determine whether a person will buy insurance or not. We can change that cutoff to 0.3 to see whether we get better results.

predict_custom <- predict(modelRF2, 
                          select(dt4_test, -CarInsurance, -starts_with("Call")),
                          type = "prob") %>% 
  mutate(new_class = factor(ifelse(Yes >= 0.3, "Yes", "No"))) %>% 
  select(new_class)
  

confusionMatrix(predict_custom$new_class, 
                reference = dt4_test$CarInsurance, 
                positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  433   9
##        Yes  46 311
##                                           
##                Accuracy : 0.9312          
##                  95% CI : (0.9113, 0.9477)
##     No Information Rate : 0.5995          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8594          
##                                           
##  Mcnemar's Test P-Value : 1.208e-06       
##                                           
##             Sensitivity : 0.9719          
##             Specificity : 0.9040          
##          Pos Pred Value : 0.8711          
##          Neg Pred Value : 0.9796          
##              Prevalence : 0.4005          
##          Detection Rate : 0.3892          
##    Detection Prevalence : 0.4468          
##       Balanced Accuracy : 0.9379          
##                                           
##        'Positive' Class : Yes             
## 

With a revised cutoff of 0.3, although we now identify too many prospective buyers, we do not unnecessarily leave out a lot of prospective customers. This is also a good lesson for us. We can’t improve the overall accuracy of the model just by changing the default cutoff of 0.5.