3.9 Multinomial logistic regression (MNL)

For MNL, we will use quality.c as the dependent variable. Recall that this is a categorical variable with groups 3, 4, 8, and 9 bundled together.15

table(wine2$quality.c)
## 
##    q_5    q_6    q_7 q_3489 
##   2138   2836   1079    444

We will use caret to estimate MNL using its multinom method. Note that caret uses nnet ( CRAN) under the hood for estimating MNL.

MNL does not require a tuning parameter. However, if we want to try out a penalized (regularized) MNL, the tuning parameter is decay (\(decay\in[0, 1]\)).16

We will start with defining the train control and creating a parameter grid.

trControl_mnl <- trainControl(method = "cv",
                              number = 10,
                              search = "grid",
                              classProbs = TRUE,
                              summaryFunction = multiClassSummary)

tuneGrid_mnl <- expand.grid(decay = seq(0, 1, by = 0.1))

Note that we are using multiClassSummary() for summaryFunction argument. This will return very detailed summary for all the classes, which can be useful when we use confusionMatrix() function.

Before we estimate MNL, we need to create a train and test set. The next code chunk creates a 80% training set and 20% test set.

set.seed(2091)
index <- caret::createDataPartition(wine2$quality.c,
                                    p = 0.8,
                                    list = FALSE)

train_wine <- wine2[index, ]
test_wine <- wine2[-index, ]

Now we are ready to estimate MNL. MNL is a parametric model that is commonly estimated using maximum likelihood estimation. As the likelihood function does not have a closed form, likelihood is maximized using an iterative process. We can provide maximum iterations to use for estimating the model, which we set at 100. Usually multinom displays the outcome of every 10th iterations. In our case, we specified 10-fold CV and 11 potential values for decay. This means that we will have 110 outputs showing results of 10 iteration blocks! We probably do not need so much output so we can turn it off by specifying trace = FALSE.

The following code may take some time to run. On my Mac it took about 28 seconds.

model_mnl <- caret::train(quality.c ~ ., 
                          data = train_wine[ , -c(12,15)],
                          method = "multinom",
                          maxit = 100,
                          trace = FALSE, # suppress iterations
                          tuneGrid = tuneGrid_mnl,
                          trControl = trControl_mnl
                          )

You might get a warning that there were missing values in resampled performance measures. Unless you really care about all the performance metrics that caret reports, this is not a major concern.

Let’s take a look at the best value for decay. For this model it is 0.2.

model_mnl$bestTune
##   decay
## 3   0.2

If you want to take a look at the full summary for each value of decay, we can print model_mnl$results data frame. Below, I print only AUC and Accuracy. It seems that the accuracy is not sensitive to the decay.

model_mnl$results %>% 
   select(decay, AUC, Accuracy)
##    decay       AUC  Accuracy
## 1    0.0 0.7210105 0.5413473
## 2    0.1 0.7209793 0.5423092
## 3    0.2 0.7208119 0.5423092
## 4    0.3 0.7207245 0.5421173
## 5    0.4 0.7206714 0.5417323
## 6    0.5 0.7205310 0.5417327
## 7    0.6 0.7204086 0.5415404
## 8    0.7 0.7203053 0.5413488
## 9    0.8 0.7201930 0.5411565
## 10   0.9 0.7200828 0.5407715
## 11   1.0 0.7199869 0.5407711

3.9.1 Model parameters

We have to calculate z-statistics and p-values manually as multinom does not report them. We first create an object that holds the summary of model_mnl. Next, we calculate z-statistics by dividing model coefficients by standard errors. Finally, we calculate p-values for 2-tailed test of significance of model parameters.

sum.mnl.c <- summary(model_mnl)
z <- sum.mnl.c$coefficients / sum.mnl.c$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2

To make the coefficients easy to read, let’s create 3 data frames that will hold all the coefficients, standard errors, z statistics, and p values. Note that MNL uses one of the categories as reference levels. If we do not explicitly specify it, it will consider the first category, in our case quality = 5, as the reference category. So all the results must be interpreted with respect to the reference level.

# Coefficients for quality = 6

coeff.mnl.c1 <- rbind(sum.mnl.c$coefficients[1,],
                      sum.mnl.c$standard.errors[1,],
                      z[1,],
                      p[1,])

rownames(coeff.mnl.c1) <- c("Coefficient", "Std. Errors", 
                            "z stat", "p value")

# Coefficients for quality = 7

coeff.mnl.c2 <- rbind(sum.mnl.c$coefficients[2,],
                      sum.mnl.c$standard.errors[2,], 
                      z[2,], 
                      p[2,])

rownames(coeff.mnl.c2) <- c("Coefficient", "Std. Errors",
                            "z stat", "p value")

# Coefficients for quality = 3489

coeff.mnl.c3 <- rbind(sum.mnl.c$coefficients[3,],
                      sum.mnl.c$standard.errors[3,],
                      z[3,],
                      p[3,])

rownames(coeff.mnl.c3) <- c("Coefficient", "Std. Errors", 
                            "z stat", "p value")

quality.c = 6

Table 3.6: Coefficients for Wine Quality = 6
Coefficient Std. Errors z stat p value
(Intercept) 0.69 0.18 3.86 0.00
fixed_acidity 0.08 0.08 0.94 0.35
volatile_acidity -0.56 0.05 -10.12 0.00
citric_acid -0.03 0.04 -0.76 0.45
residual_sugar 0.41 0.11 3.64 0.00
chlorides 0.01 0.05 0.23 0.82
free_sulfur_dioxide 0.22 0.05 4.01 0.00
total_sulfur_dioxide -0.35 0.07 -5.05 0.00
density -0.28 0.18 -1.57 0.12
pH 0.12 0.06 2.05 0.04
sulphates 0.26 0.05 5.62 0.00
alcohol 0.82 0.09 8.78 0.00
winewhite -0.16 0.23 -0.68 0.50
quality.c = 7
Table 3.7: Coefficients for Wine Quality = 7
Coefficient Std. Errors z stat p value
(Intercept) -0.45 0.25 -1.80 0.07
fixed_acidity 0.52 0.11 4.62 0.00
volatile_acidity -0.92 0.09 -10.72 0.00
citric_acid -0.08 0.07 -1.15 0.25
residual_sugar 0.94 0.15 6.08 0.00
chlorides -0.24 0.08 -2.95 0.00
free_sulfur_dioxide 0.38 0.08 4.84 0.00
total_sulfur_dioxide -0.53 0.10 -5.20 0.00
density -0.93 0.25 -3.73 0.00
pH 0.37 0.08 4.65 0.00
sulphates 0.52 0.06 8.60 0.00
alcohol 1.24 0.13 9.70 0.00
winewhite -0.57 0.33 -1.74 0.08

quality.c = 3489

Table 3.8: Coefficients for Wine Quality = 3489
Coefficient Std. Errors z stat p value
(Intercept) -3.30 0.33 -10.12 0.00
fixed_acidity 0.32 0.14 2.37 0.02
volatile_acidity 0.33 0.08 4.45 0.00
citric_acid -0.06 0.08 -0.73 0.46
residual_sugar 0.25 0.17 1.46 0.14
chlorides 0.04 0.09 0.47 0.64
free_sulfur_dioxide -0.26 0.09 -2.99 0.00
total_sulfur_dioxide -0.20 0.12 -1.67 0.10
density -0.30 0.28 -1.06 0.29
pH 0.30 0.10 3.08 0.00
sulphates 0.09 0.08 1.09 0.28
alcohol 0.88 0.15 5.96 0.00
winewhite 2.35 0.40 5.89 0.00

It’s a good exercise to compare the coefficients of MNL and linear regressions. However, MNL has 3 equations with different variables showing up significant across the 3 models. What can you do now? One way to handle this problem is to first identify the variables that are significant across the board. We will retain them. Then look at the ones that are non-significant in all of the 3 equations. Most likely these variables can be dropped from the next iteration of MNL. The variables that are significant in a few equations are the most difficult ones. You will have to take a call depending on their overall importance. If you drop them and the performance of the model deteriorates severely, perhaps adding them back is a better choice. You can also be conservative and keep any variable that’s significant at least once in the 3 models.

3.9.2 Model performance

MNL model performance can be assessed on several different metrics. I will select classification accuracy as the relevant metric. As such let’s get the confusion matrix by using the same samples that we used for estimating the model. You could do this using cross-validation.

caret::confusionMatrix(predict(model_mnl, 
                               newdata = test_wine[, -c(12, 15)], 
                               type = "raw"),
                       reference = test_wine$quality.c)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction q_5 q_6 q_7 q_3489
##     q_5    256 130  17     27
##     q_6    165 390 166     46
##     q_7      4  43  32     11
##     q_3489   2   4   0      4
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5258          
##                  95% CI : (0.4982, 0.5533)
##     No Information Rate : 0.4372          
##     P-Value [Acc > NIR] : 8.896e-11       
##                                           
##                   Kappa : 0.2356          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: q_5 Class: q_6 Class: q_7 Class: q_3489
## Sensitivity              0.5995     0.6878    0.14884      0.045455
## Specificity              0.8000     0.4836    0.94640      0.995037
## Pos Pred Value           0.5953     0.5085    0.35556      0.400000
## Neg Pred Value           0.8028     0.6660    0.84838      0.934732
## Prevalence               0.3292     0.4372    0.16577      0.067849
## Detection Rate           0.1974     0.3007    0.02467      0.003084
## Detection Prevalence     0.3315     0.5914    0.06939      0.007710
## Balanced Accuracy        0.6998     0.5857    0.54762      0.520246

The model accuracy is 54%, which is not too bad. However, kappa is just 0.26 suggesting that the model is performing poorly on this metric.17 Note that class 6 has a prevalence of around 44%. Thus, you would be right 44% of the times if you labeled all wines as 6. However, the model did a particularly poor job of detecting classes 7 and 3489. But this is the first MNL model you have estimated. You should tweak this model further to yield better results.


  1. Before we even run any model, I can tell you that this bundling is going to cause some problems. This is because we have assumed no ordering in quality. But in reality there is an ordering: 9 > 8 > 7 > …> 3. We will repeat this analysis with quality.o later with ordinal logit.

  2. In the documentation for nnet I could not find out whether the regularization is LASSO or Ridge.

  3. Usually kappa < 0.3 is considered poor.