8.14 Model performance
Finally, let’s assess the model performance by using it on the test data set.
confusionMatrix(predict(modelRF_large,
select(test_dt, -km_cluster)),
reference = test_dt$km_cluster,
positive = "Yes")
## Confusion Matrix and Statistics
##
## Reference
## Prediction c1 c2 c3 c4
## c1 0 0 27 0
## c2 8 0 0 35
## c3 6 1 0 0
## c4 0 19 1 1
##
## Overall Statistics
##
## Accuracy : 0.0102
## 95% CI : (3e-04, 0.0555)
## No Information Rate : 0.3673
## P-Value [Acc > NIR] : 1
##
## Kappa : -0.2822
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: c1 Class: c2 Class: c3 Class: c4
## Sensitivity 0.0000 0.0000 0.00000 0.02778
## Specificity 0.6786 0.4487 0.90000 0.67742
## Pos Pred Value 0.0000 0.0000 0.00000 0.04762
## Neg Pred Value 0.8028 0.6364 0.69231 0.54545
## Prevalence 0.1429 0.2041 0.28571 0.36735
## Detection Rate 0.0000 0.0000 0.00000 0.01020
## Detection Prevalence 0.2755 0.4388 0.07143 0.21429
## Balanced Accuracy 0.3393 0.2244 0.45000 0.35260
The model does a fairly good job of predicting the customer segments out of sample. We get almost 78% accuracy and Kappa equaling 0.68. The model over-classified customers in segment 2. All the misclassifications are predominantly because of classifying segment 3 customers as segment 3. Perhaps these two segments are quite close to each other on the predictor variables. On the other hand segment 1 and 4 classifications are quite accurate.