Published on Tue Apr 22 2008

Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Oren Kurland, Lillian Lee

The main idea is to perform re-ranking based on centrality within bipartite.graphs of documents (on one side) and clusters (on the other side) We find that our cluster-document graphs give rise to much better retrievalperformance than previously proposed document-only graphs do.

0
0
0
Abstract

We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them. We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based re-ranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.

Wed Jan 11 2006
NLP
PageRank without hyperlinks: Structural re-ranking using links induced by language models
0
0
0
Mon Jul 15 2019
Machine Learning
RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation
Keyword extraction is used for summarizing the content of a document. We explore how a graph-theoreticmeasure applied to graphs derived from a given text can be used to efficientlyidentify and rank keywords. The proposed method is unsupervised and interpretable.
0
0
0
Sun Apr 30 2017
Machine Learning
Scaling Active Search using Linear Similarity Functions
In this paper, we consider the problem of Active Search where we are given a similarity function between data points. We look at an algorithm introduced by Wang et al. for Active Search over graphs and propose crucial modifications which allow it to scale significantly.
0
0
0
Fri Mar 09 2018
Artificial Intelligence
Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings
Expert finding is an important task in both industry and academia. Different types of objects interact with one another, which naturally forms heterogeneous information networks. We propose a ranking algorithm to estimate the authority of objects in the network.
0
0
0
Thu Nov 30 2017
NLP
Graph Centrality Measures for Boosting Popularity-Based Entity Linking
Many Entity Linking systems use collective graph-based methods to disambiguate the entity mentions within a document. We propose to apply five centrality measures: Degree, HITS, PageRank, Betweenness and Closeness.
0
0
0
Fri Aug 21 2020
NLP
Keywords lie far from the mean of all words in local vector space
Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based methods.
0
0
0