3.3 Data

Load all the relevant packages. If you do not have any of these packages installed, use install.package() function to install it from CRAN.

library(caret)
library(dplyr)
library(moments)
library(ggplot2)
library(ggcorrplot)
library(e1071)
library(doParallel)
library(nnet)
library(reshape2)
library(ordinal)

3.3.1 Read wine data

For this solution, I am going to read the data sets that I have already downloaded and saved on Github. you don’t have to use these but if you want to, the data sets are available from my public Github repository

We can directly read the CSV files using read.csv() function from Base R. I have cleaned up the data a little bit. There are separate CSV files for red and white wine. First, we will read the two data files and add a column to indicate which type of wine it is. Finally, we will stack the two data sets on top of each other using rbind() function from Base R.

red <- read.csv("http://bit.ly/2LvaPv7",
                stringsAsFactors = FALSE) %>% 
  mutate(wine = "red")

white <- read.csv("http://bit.ly/2VlYfCJ",
                  stringsAsFactors = FALSE) %>%
  mutate(wine = "white")

wine <- rbind(red, white) %>% 
   mutate(wine = as.factor(wine))

Note that I changed the variable class of wine to factor. This is because it will be easier for us to use this variable directly in the models as R will internally create a indicator variable such that red wine will equal 0 and white wine will equal 1. We will later create a variable that deals with explicitly.