3.5 Create new variants of quality
For support vector machines (SVM) and multinomial logistic model (MNL), we will create a new variable labeled quality.c
which will be a factor variable with groups 3, 4, 8, and 9 combined in another group. We can label this combined group anything we want as the labeling is meaningless for these methods. I will label this new group 3489
, thereby preserving the knowledge that this group came from 4 separate groups. For many operations, caret
package requires that factor levels should be valid variable names. Therefore, we will add a prefix q_
before the numbers in quality
to create quality.c
.
For ordinal regression, we have a little bit more information about the ordering of the groups. We will create a new variable quality.o
. In this variable, we will combine 3, 4, and 5 and label it 5 to indicate this as “5 and lower”. Similarly, we will combine 7, 8, and 9 and label it 7 to indicate that this group is “7 and above”. Thus, we will effectively have only 3 groups.
Clearly, this will make model comparison a little bit tough but we have to give each model the best chance to perform even at this lower level of analysis.
wine <- wine %>%
mutate(quality.c = ifelse(quality %in% c(5, 6, 7),
paste0("q_", quality),
"q_3489"),
quality.o = ifelse(quality <= 5,
5,
ifelse(quality >= 7, 7, 6))) %>%
mutate(quality.c = factor(quality.c,
levels = c("q_5", "q_6",
"q_7", "q_3489"))) %>%
mutate(quality.o = ordered(quality.o))
Check the structure of new variables.
str(wine$quality.c)
## Factor w/ 4 levels "q_5","q_6","q_7",..: 1 1 1 2 1 1 1 3 3 1 ...
str(wine$quality.o)
## Ord.factor w/ 3 levels "5"<"6"<"7": 1 1 1 2 1 1 1 3 3 1 ...