Published on Thu Mar 13 2014

Scalable and Robust Construction of Topical Hierarchies

Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topics.

0
0
0
Abstract

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications. In this paper a scalable and robust algorithm is proposed for constructing a hierarchy of topics from a text collection. We divide and conquer the problem using a top-down recursive framework, based on a tensor orthogonal decomposition technique. We solve a critical challenge to perform scalable inference for our newly designed hierarchical topic model. Experiments with various real-world datasets illustrate its ability to generate robust, high-quality hierarchies efficiently. Our method reduces the time of construction by several orders of magnitude, and its robust feature renders it possible for users to interactively revise the hierarchy.

Mon Jan 04 2016
Artificial Intelligence
Scalable Models for Computing Hierarchies in Information Networks
Information hierarchies are organizational structures that often used to organize and present large and complex information. Many statistical and computational models exist that automatically generate hierarchies. However, existing approaches do not consider linkages in information networks. In this paper we present the Hierarchical Document Topic Model.
0
0
0
Tue Feb 14 2012
Machine Learning
Sparse Topical Coding
We present sparse topical coding (STC), a non-probabilistic formulation of topic models. Unlike probabilistic topic models, STC relaxes the normalization of admixture proportions. STC can be seamlessly integrated with a convex error function for supervised learning.
0
0
0
Tue Jun 24 2014
Machine Learning
Scalable Topical Phrase Mining from Text Corpora
Most topic modeling algorithms model text corpora with unigrams. We propose a different approach that is both computationally efficient and effective. Our approach discovers high quality topical phrases with negligible extra cost.
0
0
0
Tue Jan 19 2021
Machine Learning
Analysis and tuning of hierarchical topic models based on Renyi entropy approach
Hierarchical topic modeling is a potentially powerful instrument for determining the topical structure of text collections. However, tuning the parameters of hierarchical models, including the number of topics, remains a challenging task and an open issue. We propose a Renyi entropy-based metric of quality for hierarchical models.
0
0
0
Thu Sep 29 2016
Machine Learning
Topic Browsing for Research Papers with Hierarchical Latent Tree Analysis
An online catalog of research papers has been automatically categorized by a topic model. The catalog contains 7719 papers from the proceedings of two artificial intelligence conferences from 2000 to 2015.
0
0
0
Tue Aug 29 2017
Machine Learning
Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling
hrLDA is a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams and considers syntax and document structures.
0
0
0