RESEARCH LOG

Interesting Papers

October 4, 2008 · Leave a Comment

Here is a small bibliography of interesting papers that are related to topic modeling:

The foundational paper is by Blei, Ng, and Jordan Latent Dirichlet Allocation

Blei et. al. Latent Dirichlet Allocation JMLR 3, 993 (2003) – 783 citations as of Oct 2008.

Abstract:

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model

www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

The author Blei, has extended this model in a number of different directions since 2003:

The first extension of is to track changes in the topics over time – in the original LDA scheme the entire text corpus was generated by a fixed finite pool of topics.  In the dynamic topic model the pool of topic vectors is allowed to change over time. This captures the intuition that

Another extension is the correlated topic model – this overcomes some limitations of the dirichlet distribution for modeling purposes.  One property of the dirichlet distribution over vectors in an M dimensional simplex is that the components of the vector are only correlated with each other by the normalization condition. This makes it impossible to model complex covariance structure over topics. e.g. articles about fashion are likely to be about shoes but unlikely to be about food.

D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics. 1:1 17–35. (PDF) (shorter version from NIPS 18) (code)(browser)

There is also the very interesting question of modeling how the topics evolve over time.  The corpus of science magazine from 1880-2002 was modeled by splitting the text into decades and estimating the vector of topics for that year. This could be improved by a smoothing method.

D. Blei and J. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006. (PDF)

Here is a video of a talk by Blei at GOOG discussing this work.

There are also some interesting extensions of these idea deleloped by other groups:

Pachinko Allocation

  1. [PDF]

    Nonparametric Bayes Pachinko Allocation

    File Format: PDF/Adobe Acrobat – View as HTML
    Nonparametric Bayes Pachinko Allocation. Wei Li. Department of Computer Science. University of Massachusetts. Amherst, MA 01003. David Blei
    www.cs.princeton.edu/~blei/papers/LiBleiMcCallum2007.pdf – Similar pagesNote this
    by W LiCited by 5Related articlesAll 2 versions

  2. [PDF]

    Pachinko Allocation: DAG-Structured Mixture Models of Topic

    File Format: PDF/Adobe Acrobat – View as HTML
    In this section, we detail the pachinko allocation model. (PAM), and describe its generative Now we introduce notation for the pachinko allocation
    www.icml2006.org/icml_documents/camera-ready/073_Pachinko_Allocation.pdf – Similar pagesNote this
    by W LiCited by 33Related articlesAll 7 versions

  3. [PDF]

    Mixtures of Hierarchical Topics with Pachinko Allocation

    File Format: PDF/Adobe Acrobat – View as HTML
    Pachinko allocation models documents as a mixture. of distributions over a single set of topics ….. Finally, there is no reason that a Pachinko Allocation
    www.machinelearning.org/proceedings/icml2007/papers/453.pdf – Similar pagesNote this
    by D MimnoCited by 4Related articlesAll 5 versions

This is also very interesting and related to what I would like to do:

Discovering Evolutionary Theme Patterns from Text – An Exploration

File Format: PDF/Adobe Acrobat – View as HTML
Temporal Text Mining (TTM) is concerned with discovering. temporal patterns in text information Keywords: Temporal text mining, evolutionary theme pat-
sifaka.cs.uiuc.edu/czhai/pub/kdd05-ttm.pdf – Similar pagesNote this
by Q MeiCited by 63Related articlesAll 4 versions

Categories: papers

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment