3.11 Ordinal Regression

In this last section we will use quality.o for estimating an ordinal model. Ordinal model can be estimated using several link functions. we will use a logit link.

We will use ordinal package and clm function.

model.ordinal <- ordinal::clm(quality.o ~ ., 
                     data = train_wine[,-c(12, 13, 14)])
summary(model.ordinal)
## formula: 
## quality.o ~ fixed_acidity + volatile_acidity + citric_acid + residual_sugar + chlorides + free_sulfur_dioxide + total_sulfur_dioxide + density + pH + sulphates + alcohol + white_wine
## data:    train_wine[, -c(12, 13, 14)]
## 
##  link  threshold nobs logLik   AIC     niter max.grad cond.H 
##  logit flexible  5200 -4441.32 8910.64 5(0)  7.31e-07 3.7e+02
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## fixed_acidity         0.34702    0.06443   5.386 7.19e-08 ***
## volatile_acidity     -0.67898    0.04604 -14.747  < 2e-16 ***
## citric_acid          -0.05607    0.03595  -1.560 0.118794    
## residual_sugar        0.70094    0.09169   7.645 2.10e-14 ***
## chlorides            -0.13134    0.04432  -2.963 0.003044 ** 
## free_sulfur_dioxide   0.43192    0.04487   9.627  < 2e-16 ***
## total_sulfur_dioxide -0.44547    0.05650  -7.885 3.16e-15 ***
## density              -0.69074    0.14500  -4.764 1.90e-06 ***
## pH                    0.23172    0.04600   5.038 4.71e-07 ***
## sulphates             0.35088    0.03514   9.984  < 2e-16 ***
## alcohol               0.78396    0.07324  10.704  < 2e-16 ***
## white_wine           -0.64337    0.18758  -3.430 0.000604 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 5|6  -1.2401     0.1459  -8.501
## 6|7   1.4284     0.1461   9.778

We have a really nice ordinal model here. Similar to the linear regression model, we have all except citric_acid turning up significant at 5% level. The two threshold are also statistically significant indicating that our model identified distinct thresholds to isolate 6 from 5 and 7 from 6. Let’s study the model performance using a confusion matrix.

confusionMatrix(reference = test_wine$quality.o,
                unlist(predict(model.ordinal, 
                               newdata = test_wine[,-c(12, 13, 14, 15)], 
                               type = "class")))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   5   6   7
##          5 295 140  21
##          6 175 368 168
##          7   6  59  65
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5613          
##                  95% CI : (0.5338, 0.5885)
##     No Information Rate : 0.4372          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.2828          
##                                           
##  Mcnemar's Test P-Value : 6.225e-14       
## 
## Statistics by Class:
## 
##                      Class: 5 Class: 6 Class: 7
## Sensitivity            0.6197   0.6490  0.25591
## Specificity            0.8039   0.5301  0.93768
## Pos Pred Value         0.6469   0.5176  0.50000
## Neg Pred Value         0.7848   0.6604  0.83805
## Prevalence             0.3670   0.4372  0.19584
## Detection Rate         0.2274   0.2837  0.05012
## Detection Prevalence   0.3516   0.5482  0.10023
## Balanced Accuracy      0.7118   0.5896  0.59679

As it turns out at 59% the model accuracy is reasonable but not that great. Ordinal regression is the most appropriate model in this case, however. This is because quality is actually ordinal.

Exercise: Estimate wine quality model using SVM and quality.o. Check whether it has a better model performance than previous models.

In the analysis that I did separately to run SVM using ordinal quality, I found that the out of sample accuracy for SVM was 70%, which is better than any other model we looked at so far.