Published on Wed Jul 20 2016

An Adaptation of Topic Modeling to Sentences

Ruey-Cheng Chen, Reid Swanson, Andrew S. Gordon

We adapt the approach of latent-Dirichlet allocation to include an additional layer. We show that the addition of this minimal information of document structure improves the perplexity results of a trained model.

0
0
0
Abstract

Advances in topic modeling have yielded effective methods for characterizing the latent semantics of textual data. However, applying standard topic modeling approaches to sentence-level tasks introduces a number of challenges. In this paper, we adapt the approach of latent-Dirichlet allocation to include an additional layer for incorporating information about the sentence boundaries in documents. We show that the addition of this minimal information of document structure improves the perplexity results of a trained model.

Wed Jun 01 2016
Machine Learning
On a Topic Model for Sentences
sentenceLDA is an extension of the probabilistic topic model (LDA) The goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes.
0
0
0
Mon Oct 15 2018
Machine Learning
Improving Topic Models with Latent Feature Word Representations
Probabilistic topic models are widely used to discover latent topics in documents. latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations.
0
0
0
Tue Jun 26 2018
Machine Learning
Unveiling the semantic structure of text documents using paragraph-aware Topic Models
Classic Topic Models are built under the Bag Of Words assumption, in which word position is ignored for simplicity. In order to easily learn topics with different properties among the same corpus, we propose a new line of work in which the paragraph structure is exploited.
0
0
0
Wed May 09 2012
Machine Learning
Multilingual Topic Models for Unaligned Text
We develop the multilingual topic model for unaligned text (MuTo) MuTo is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to discover both a matching between languages and multilingual topics.
0
0
0
Thu Aug 07 2008
Artificial Intelligence
Text Modeling using Unsupervised Topic Models and Concept Hierarchies
Human-defined concepts tend to be semantically richer due to careful selection of words to define concepts. Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge.
0
0
0
Thu Feb 25 2010
Artificial Intelligence
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language. It assumes that each word is drawn from a latent topic chosen by combining document-level features and local syntactic context.
0
0
0