8.4 Clustering variables

In marketing, three variables from customer purchase history are known to play a bit role in predicting future purchases. These variables are recency, frequency, and monetary value. Analysis involving these variables is called RFM analysis. In our context, these are measured as:

  1. Recency: The number of quarters since the last purchase

  2. Frequency: The number of times transacted over 12 quarters. As the exact number of transactions is unavailable, we assume that customer has transacted once in a quarter, if the total expenditure is non-zero.

  3. Monetary value: The total dollar value of transaction in 12 quarters.

Accordingly, we create a new data set with these variables. The last two variables are straightforward to calculate so we will first generate a data set with these variables.

cluster_data1 <-
  customerRetentionTransactions %>%
    group_by(customer) %>%
     summarize(frequency = sum(purchase),
               monetary_value = sum(order_quantity)) %>%
    ungroup()

Next, we create a column for recency and save it in another data frame.

cluster_data2 <-
  customerRetentionTransactions %>%
               filter(purchase == 1) %>%
               group_by(customer) %>%
                  summarise(last_transaction = last(quarter)) %>%
                  mutate(recency = 12 - last_transaction) %>%
               ungroup() %>%
               select(-last_transaction)

Finally, we will merge these two data sets. We will also merge the demographics data.

cluster_data <- inner_join(cluster_data1,
                           cluster_data2,
                           by = "customer") %>% 
  inner_join(customerRetentionDemographics,
             by = "customer")