Here is a small bibliography of interesting papers that are related to topic modeling:
The foundational paper is by Blei, Ng, and Jordan Latent Dirichlet Allocation
Blei et. al. Latent Dirichlet Allocation JMLR 3, 993 (2003) – 783 citations as of Oct 2008.
Abstract:
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model
www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
The author Blei, has extended this model in a number of different directions since 2003:
The first extension of is to track changes in the topics over time – in the original LDA scheme the entire text corpus was generated by a fixed finite pool of topics. In the dynamic topic model the pool of topic vectors is allowed to change over time. This captures the intuition that
Another extension is the correlated topic model – this overcomes some limitations of the dirichlet distribution for modeling purposes. One property of the dirichlet distribution over vectors in an M dimensional simplex is that the components of the vector are only correlated with each other by the normalization condition. This makes it impossible to model complex covariance structure over topics. e.g. articles about fashion are likely to be about shoes but unlikely to be about food.
D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics. 1:1 17–35. (PDF) (shorter version from NIPS 18) (code)(browser)
There is also the very interesting question of modeling how the topics evolve over time. The corpus of science magazine from 1880-2002 was modeled by splitting the text into decades and estimating the vector of topics for that year. This could be improved by a smoothing method.
D. Blei and J. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006. (PDF)
Here is a video of a talk by Blei at GOOG discussing this work.
There are also some interesting extensions of these idea deleloped by other groups:
Pachinko Allocation
- [PDF]
Nonparametric Bayes Pachinko Allocation
File Format: PDF/Adobe Acrobat – View as HTML
Nonparametric Bayes Pachinko Allocation. Wei Li. Department of Computer Science. University of Massachusetts. Amherst, MA 01003. David Blei …
www.cs.princeton.edu/~blei/papers/LiBleiMcCallum2007.pdf – Similar pages – Note this
by W Li – Cited by 5 – Related articles – All 2 versions - [PDF]
Pachinko Allocation: DAG-Structured Mixture Models of Topic …
File Format: PDF/Adobe Acrobat – View as HTML
In this section, we detail the pachinko allocation model. (PAM), and describe its generative … Now we introduce notation for the pachinko allocation …
www.icml2006.org/icml_documents/camera-ready/073_Pachinko_Allocation.pdf – Similar pages – Note this
by W Li – Cited by 33 – Related articles – All 7 versions - [PDF]
Mixtures of Hierarchical Topics with Pachinko Allocation
File Format: PDF/Adobe Acrobat – View as HTML
Pachinko allocation models documents as a mixture. of distributions over a single set of topics ….. Finally, there is no reason that a Pachinko Allocation …
www.machinelearning.org/proceedings/icml2007/papers/453.pdf – Similar pages – Note this
by D Mimno – Cited by 4 – Related articles – All 5 versions
This is also very interesting and related to what I would like to do:
Discovering Evolutionary Theme Patterns from Text – An Exploration …
Temporal Text Mining (TTM) is concerned with discovering. temporal patterns in text information … Keywords: Temporal text mining, evolutionary theme pat- …
sifaka.cs.uiuc.edu/czhai/pub/kdd05-ttm.pdf – Similar pages – Note this
by Q Mei – Cited by 63 – Related articles – All 4 versions
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.