8.14 Model performance

Finally, let’s assess the model performance by using it on the test data set.

confusionMatrix(predict(modelRF_large, 
                        select(test_dt, -km_cluster)), 
                reference = test_dt$km_cluster, 
                positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction c1 c2 c3 c4
##         c1  0  0 27  0
##         c2  8  0  0 35
##         c3  6  1  0  0
##         c4  0 19  1  1
## 
## Overall Statistics
##                                          
##                Accuracy : 0.0102         
##                  95% CI : (3e-04, 0.0555)
##     No Information Rate : 0.3673         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : -0.2822        
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: c1 Class: c2 Class: c3 Class: c4
## Sensitivity             0.0000    0.0000   0.00000   0.02778
## Specificity             0.6786    0.4487   0.90000   0.67742
## Pos Pred Value          0.0000    0.0000   0.00000   0.04762
## Neg Pred Value          0.8028    0.6364   0.69231   0.54545
## Prevalence              0.1429    0.2041   0.28571   0.36735
## Detection Rate          0.0000    0.0000   0.00000   0.01020
## Detection Prevalence    0.2755    0.4388   0.07143   0.21429
## Balanced Accuracy       0.3393    0.2244   0.45000   0.35260

The model does a fairly good job of predicting the customer segments out of sample. We get almost 78% accuracy and Kappa equaling 0.68. The model over-classified customers in segment 2. All the misclassifications are predominantly because of classifying segment 3 customers as segment 3. Perhaps these two segments are quite close to each other on the predictor variables. On the other hand segment 1 and 4 classifications are quite accurate.