6.3 Download tweets

Start with loading Twitter credentials in your R session and loading relevant packages as shown below. For instructions on getting a Twitter token, please see Chapter 5.2.

library(rtweet)  # Twitter package
library(dplyr)
library(ggplot2)
library(reshape2)
library(purrr)
library(janitor) # Row percentages
library(Hmisc) # Correlations
library(ggcorrplot) # Correlations plot

# Packages for text analysis
library(syuzhet)

Load Twitter token

load(here::here("twitter_token"))

Take a look at the ACSI scores of airlines.34

Table 6.1 shows Twitter handles for the airlines. I also copied the 2019 scores and pasted in this table.

Table 6.1: Airlines Customer Satisfaction
Airline Twitter Handle ACSI Score
Alaska @AlaskaAir 80
Southwest @SouthwestAir 79
JetBlue @JetBlue 79
Delta @Delta 75
American @AmericanAir 73
Allegiant @Allegiant 71
United @United 70
Frontier @FlyFrontier 64
Spirit @SpiritAirlines 63

From Table 6.1, Alaska Airlines has the highest customer satisfaction while Frontier and Spirit have the lowest customer satisfaction. Both these airlines are low cost and people constantly complain about them.35

6.3.1 Collect tweets

In the following code, we first create a vector airline_tw which has Twitter handles for the 9 airlines. Next we set up an empty list airlines_list to hold the tweets for each airline. The critical piece of code is the for loop. We will download up to 2,000 tweets per airline. You can try to download more if you want. I just wanted to stay within the rate limit and get all the tweets at once.36 We also limit the language of the tweets to English and geography to the US.

The output of the following code will be airlines_list with 9 data frames with a maximum of 2,000 rows in any data frame.37

airline_tw <- c("@AlaskaAir", "@SouthwestAir", "@JetBlue",
                "@Delta", "@AmericanAir", "@Allegiant", 
                "@United", "@FlyFrontier", "@SpiritAirlines")

airlines_list <- list()

for (i in 1:9) {
  print(paste("Getting tweets for", airline_tw[i]))
  
  airlines_list[[i]] <- search_tweets(
    q = airline_tw[i], 
    lang = 'en',
    geocode = lookup_coords("usa"),
    n = 2000, 
    include_rts = FALSE, # exclude retweets
    )
}

6.3.2 Adding airline as a column

Ideally we would like to stack 9 data frames on top of each other and then carry out the sentiment analysis. However, none of the data frames has a column that identifies which airline the tweets belong to! I strongly encourage you to take a look at any of the 9 data frames by using names() and head() functions.

In order to add a column in each data frame while still being a part of the list, we will use map2_dfr() function from purrr package. This function iterates over two arguments simultaneously and then row binds the resulting data frames. In the code below, it will iterate over the list airlines_list while also iterating over the vector Airline. Note that Airline just holds the names of the 9 airlines. map2_dfr() will then add (using mutate()) a column called airline to each data frame stored in airlines_list and assign this column the value stored in the vector Airline.38 Finally, it will row bind these 9 data frames and return a single data frame called airlines_df.

Airline = c("Alaska", "Southwest", "JetBlue", 
            "Delta", "American", "Allegiant", 
            "United", "Frontier", "Spirit")

airlines_df <- map2_dfr(.x = airlines_list,
                          .y = Airline,
                          ~ mutate(.x, airline = .y) )

  1. https://www.theacsi.org/acsi-benchmarks/benchmarks-by-industry

  2. Check out the reviews of Frontier on TripAdvisor.

  3. Recall that Twitter allows you to download 18,000 tweets every 15 minutes.

  4. It will take about 5-10 minutes depending on your Internet speed.

  5. If you find this confusing, you need to read more on map() family of functions from purrr.