The topic of this weblog is, topic modeling!
A topic model is a statistical model that is designed to capture our intuition that a given document is about a small set of themes. e.g. Newsweek has articles about politics, business, heath, celebrities etc., Car and Driver has articles about tires, transmissions, upholstery, etc.
The idea is to analyze the different words which occur in a document – (as well as the words which do not occur) and then assign the document a small set of topics (or a distribution over topics) this topic distribution can be thought of as a fingerprint of the document.
Topic modelling is usually formulated as a problem in unsupervised learning – the algorithm is given a dataset and it is not told what the topics are, or which documents are about which topics. This structure must be extracted from the data, by statistical analysis.
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.