Published on Thu Jun 13 2019

Topic Modeling via Full Dependence Mixtures

Dan Fisher, Mark Kozdoba, Shie Mannor

In this approach, topics are learned directly from the co-occurrence data of the corpus. We evaluate the approach on two large datasets, NeurIPS papers and a Twitter corpus, with a large number of topics. The approach performs comparably or better than the standard benchmarks.

0
0
0
Abstract

In this paper we introduce a new approach to topic modelling that scales to large datasets by using a compact representation of the data and by leveraging the GPU architecture. In this approach, topics are learned directly from the co-occurrence data of the corpus. In particular, we introduce a novel mixture model which we term the Full Dependence Mixture (FDM) model. FDMs model second moment under general generative assumptions on the data. While there is previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. Moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus. Our approach allows us to leverage standard optimizers and GPUs for the problem of topic modeling. In particular, we evaluate the approach on two large datasets, NeurIPS papers and a Twitter corpus, with a large number of topics, and show that the approach performs comparably or better than the the standard benchmarks.

Thu Apr 07 2016
Machine Learning
Combinatorial Topic Models using Small-Variance Asymptotics
topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and are based on Latent Dirichlet Allocation (LDA) or its variants. In contrast, we study topic modeling as a combinatorial optimization problem.
0
0
0
Thu Oct 27 2016
Machine Learning
Geometric Dirichlet Means algorithm for topic inference
We propose a geometric algorithm for topic learning and inference. It is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model. The topic estimates produced by our method are shown to be statistically consistent under some conditions.
0
0
0
Thu Mar 24 2011
Machine Learning
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is ageneralization of the hierarchical Dirichlet process (HDP) that models structure between the weights of the atoms at the group level.
0
0
0
Fri Oct 26 2012
Artificial Intelligence
Managing sparsity, time, and quality of inference in topic models
Inference is an integral part of probabilistic topic models. It is often difficult to derive an efficient algorithm for a specific model. Inference in topic models with nonconjugate priors can be done efficiently.
0
0
0
Fri Nov 04 2016
Machine Learning
Generalized Topic Modeling
In standard topic models, a topic is viewed as a probability distribution over words. We aim to learn a predictor that given a new document, accurately predicts its topic mixture. We present several natural conditions under which one can do this efficiently and discuss issues such as noise tolerance and sample complexity.
0
0
0
Fri Oct 21 2011
Machine Learning
Kernel Topic Models
Latent Dirichlet Allocation models data as a mixture of discrete distributions. We study avariation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space.
0
0
0