3.11 Ordinal Regression
In this last section we will use quality.o
for estimating an ordinal model. Ordinal model can be estimated using several link functions. we will use a logit link.
We will use ordinal
package and clm
function.
model.ordinal <- ordinal::clm(quality.o ~ .,
data = train_wine[,-c(12, 13, 14)])
summary(model.ordinal)
## formula:
## quality.o ~ fixed_acidity + volatile_acidity + citric_acid + residual_sugar + chlorides + free_sulfur_dioxide + total_sulfur_dioxide + density + pH + sulphates + alcohol + white_wine
## data: train_wine[, -c(12, 13, 14)]
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 5200 -4441.32 8910.64 5(0) 7.31e-07 3.7e+02
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## fixed_acidity 0.34702 0.06443 5.386 7.19e-08 ***
## volatile_acidity -0.67898 0.04604 -14.747 < 2e-16 ***
## citric_acid -0.05607 0.03595 -1.560 0.118794
## residual_sugar 0.70094 0.09169 7.645 2.10e-14 ***
## chlorides -0.13134 0.04432 -2.963 0.003044 **
## free_sulfur_dioxide 0.43192 0.04487 9.627 < 2e-16 ***
## total_sulfur_dioxide -0.44547 0.05650 -7.885 3.16e-15 ***
## density -0.69074 0.14500 -4.764 1.90e-06 ***
## pH 0.23172 0.04600 5.038 4.71e-07 ***
## sulphates 0.35088 0.03514 9.984 < 2e-16 ***
## alcohol 0.78396 0.07324 10.704 < 2e-16 ***
## white_wine -0.64337 0.18758 -3.430 0.000604 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 5|6 -1.2401 0.1459 -8.501
## 6|7 1.4284 0.1461 9.778
We have a really nice ordinal model here. Similar to the linear regression model, we have all except citric_acid
turning up significant at 5% level. The two threshold are also statistically significant indicating that our model identified distinct thresholds to isolate 6 from 5 and 7 from 6. Let’s study the model performance using a confusion matrix.
confusionMatrix(reference = test_wine$quality.o,
unlist(predict(model.ordinal,
newdata = test_wine[,-c(12, 13, 14, 15)],
type = "class")))
## Confusion Matrix and Statistics
##
## Reference
## Prediction 5 6 7
## 5 295 140 21
## 6 175 368 168
## 7 6 59 65
##
## Overall Statistics
##
## Accuracy : 0.5613
## 95% CI : (0.5338, 0.5885)
## No Information Rate : 0.4372
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.2828
##
## Mcnemar's Test P-Value : 6.225e-14
##
## Statistics by Class:
##
## Class: 5 Class: 6 Class: 7
## Sensitivity 0.6197 0.6490 0.25591
## Specificity 0.8039 0.5301 0.93768
## Pos Pred Value 0.6469 0.5176 0.50000
## Neg Pred Value 0.7848 0.6604 0.83805
## Prevalence 0.3670 0.4372 0.19584
## Detection Rate 0.2274 0.2837 0.05012
## Detection Prevalence 0.3516 0.5482 0.10023
## Balanced Accuracy 0.7118 0.5896 0.59679
As it turns out at 59% the model accuracy is reasonable but not that great. Ordinal regression is the most appropriate model in this case, however. This is because quality
is actually ordinal.
Exercise: Estimate wine quality model using SVM and quality.o
. Check whether it has a better model performance than previous models.
In the analysis that I did separately to run SVM using ordinal quality, I found that the out of sample accuracy for SVM was 70%, which is better than any other model we looked at so far.