8.4 Clustering variables
In marketing, three variables from customer purchase history are known to play a bit role in predicting future purchases. These variables are recency, frequency, and monetary value. Analysis involving these variables is called RFM analysis. In our context, these are measured as:
Recency: The number of quarters since the last purchase
Frequency: The number of times transacted over 12 quarters. As the exact number of transactions is unavailable, we assume that customer has transacted once in a quarter, if the total expenditure is non-zero.
Monetary value: The total dollar value of transaction in 12 quarters.
Accordingly, we create a new data set with these variables. The last two variables are straightforward to calculate so we will first generate a data set with these variables.
cluster_data1 <-
customerRetentionTransactions %>%
group_by(customer) %>%
summarize(frequency = sum(purchase),
monetary_value = sum(order_quantity)) %>%
ungroup()
Next, we create a column for recency and save it in another data frame.
cluster_data2 <-
customerRetentionTransactions %>%
filter(purchase == 1) %>%
group_by(customer) %>%
summarise(last_transaction = last(quarter)) %>%
mutate(recency = 12 - last_transaction) %>%
ungroup() %>%
select(-last_transaction)
Finally, we will merge these two data sets. We will also merge the demographics data.
cluster_data <- inner_join(cluster_data1,
cluster_data2,
by = "customer") %>%
inner_join(customerRetentionDemographics,
by = "customer")