10 Topic Modeling

Topic modeling allows us to identify topics embedded in the textual data. The assumption is that each text document is composed of one or more topics, which are latent. The job of a topic model then is to extract the topics from text. In this chapter we will use a popular probabilistic model called Latent Dirichlet Allocation (LDA) first introduced by Blei, Ng, and Jordan (2003).53 LDA is a unsupervised machine learning method as we don’t know the target variable (i.e., the latent topic) ex ante.


  1. Blei, David M (2003), “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3, 993–1022.