6.3 Download tweets
Start with loading Twitter credentials in your R session and loading relevant packages as shown below. For instructions on getting a Twitter token, please see Chapter 5.2.
library(rtweet) # Twitter package
library(dplyr)
library(ggplot2)
library(reshape2)
library(purrr)
library(janitor) # Row percentages
library(Hmisc) # Correlations
library(ggcorrplot) # Correlations plot
# Packages for text analysis
library(syuzhet)
Load Twitter token
load(here::here("twitter_token"))
Take a look at the ACSI scores of airlines.34
Table 6.1 shows Twitter handles for the airlines. I also copied the 2019 scores and pasted in this table.
Airline | Twitter Handle | ACSI Score |
---|---|---|
Alaska | @AlaskaAir | 80 |
Southwest | @SouthwestAir | 79 |
JetBlue | @JetBlue | 79 |
Delta | @Delta | 75 |
American | @AmericanAir | 73 |
Allegiant | @Allegiant | 71 |
United | @United | 70 |
Frontier | @FlyFrontier | 64 |
Spirit | @SpiritAirlines | 63 |
From Table 6.1, Alaska Airlines has the highest customer satisfaction while Frontier and Spirit have the lowest customer satisfaction. Both these airlines are low cost and people constantly complain about them.35
6.3.1 Collect tweets
In the following code, we first create a vector airline_tw
which has Twitter handles for the 9 airlines. Next we set up an empty list airlines_list
to hold the tweets for each airline. The critical piece of code is the for
loop. We will download up to 2,000 tweets per airline. You can try to download more if you want. I just wanted to stay within the rate limit and get all the tweets at once.36 We also limit the language of the tweets to English and geography to the US.
The output of the following code will be airlines_list
with 9 data frames with a maximum of 2,000 rows in any data frame.37
airline_tw <- c("@AlaskaAir", "@SouthwestAir", "@JetBlue",
"@Delta", "@AmericanAir", "@Allegiant",
"@United", "@FlyFrontier", "@SpiritAirlines")
airlines_list <- list()
for (i in 1:9) {
print(paste("Getting tweets for", airline_tw[i]))
airlines_list[[i]] <- search_tweets(
q = airline_tw[i],
lang = 'en',
geocode = lookup_coords("usa"),
n = 2000,
include_rts = FALSE, # exclude retweets
)
}
6.3.2 Adding airline as a column
Ideally we would like to stack 9 data frames on top of each other and then carry out the sentiment analysis. However, none of the data frames has a column that identifies which airline the tweets belong to! I strongly encourage you to take a look at any of the 9 data frames by using names()
and head()
functions.
In order to add a column in each data frame while still being a part of the list, we will use map2_dfr()
function from purrr
package. This function iterates over two arguments simultaneously and then row binds the resulting data frames. In the code below, it will iterate over the list airlines_list
while also iterating over the vector Airline
. Note that Airline
just holds the names of the 9 airlines. map2_dfr()
will then add (using mutate()
) a column called airline
to each data frame stored in airlines_list
and assign this column the value stored in the vector Airline
.38 Finally, it will row bind these 9 data frames and return a single data frame called airlines_df
.
Airline = c("Alaska", "Southwest", "JetBlue",
"Delta", "American", "Allegiant",
"United", "Frontier", "Spirit")
airlines_df <- map2_dfr(.x = airlines_list,
.y = Airline,
~ mutate(.x, airline = .y) )
https://www.theacsi.org/acsi-benchmarks/benchmarks-by-industry↩
Check out the reviews of Frontier on TripAdvisor.↩
Recall that Twitter allows you to download 18,000 tweets every 15 minutes.↩
It will take about 5-10 minutes depending on your Internet speed.↩
If you find this confusing, you need to read more on
map()
family of functions frompurrr
.↩