3.5 Create new variants of quality

For support vector machines (SVM) and multinomial logistic model (MNL), we will create a new variable labeled quality.c which will be a factor variable with groups 3, 4, 8, and 9 combined in another group. We can label this combined group anything we want as the labeling is meaningless for these methods. I will label this new group 3489, thereby preserving the knowledge that this group came from 4 separate groups. For many operations, caret package requires that factor levels should be valid variable names. Therefore, we will add a prefix q_ before the numbers in quality to create quality.c.

For ordinal regression, we have a little bit more information about the ordering of the groups. We will create a new variable quality.o. In this variable, we will combine 3, 4, and 5 and label it 5 to indicate this as “5 and lower”. Similarly, we will combine 7, 8, and 9 and label it 7 to indicate that this group is “7 and above”. Thus, we will effectively have only 3 groups.

Clearly, this will make model comparison a little bit tough but we have to give each model the best chance to perform even at this lower level of analysis.

wine <- wine %>%
  mutate(quality.c = ifelse(quality %in% c(5, 6, 7), 
                            paste0("q_", quality), 
                            "q_3489"),
         quality.o = ifelse(quality <= 5, 
                            5,
                            ifelse(quality >= 7, 7, 6))) %>% 
  mutate(quality.c = factor(quality.c, 
                            levels = c("q_5", "q_6", 
                                       "q_7", "q_3489"))) %>%
  mutate(quality.o = ordered(quality.o))

Check the structure of new variables.

str(wine$quality.c)
##  Factor w/ 4 levels "q_5","q_6","q_7",..: 1 1 1 2 1 1 1 3 3 1 ...
str(wine$quality.o)
##  Ord.factor w/ 3 levels "5"<"6"<"7": 1 1 1 2 1 1 1 3 3 1 ...