10.10 Summary

In this chapter we learned how to fit Latent Dirichlet Allocation model on textual data. We used Amazon product reviews data from Kaggle and identified 20 topics from the review text. Next we determined the importance of topics by classifying product ratings using the posterior probabilities of the topics. For this we used Random Forest. A linear model is difficult to use for this application because the probabilities of each row add up to 1.

In this example, we used a fixed number of topics. Ideally, we would like to tune this hyperparameters. The chapter references the method to find the optimum number of topics.