Published on Sun Jan 31 2021

Extending Neural Keyword Extraction with TF-IDF tagset matching

Boshko Koloski, Senja Pollak, Blaž Škrlj, Matej Martinc

Keyword extraction is the task of identifying words that best describe a given document. In this work we develop and evaluate our methods on four novel data sets covering less represented, morphologically-rich languages in European news media industry.

0
0
0
Abstract

Keyword extraction is the task of identifying words (or multi-word expressions) that best describe a given document and serve in news portals to link articles of similar topics. In this work we develop and evaluate our methods on four novel data sets covering less represented, morphologically-rich languages in European news media industry (Croatian, Estonian, Latvian and Russian). First, we perform evaluation of two supervised neural transformer-based methods (TNT-KID and BERT+BiLSTM CRF) and compare them to a baseline TF-IDF based unsupervised approach. Next, we show that by combining the keywords retrieved by both neural transformer based methods and extending the final set of keywords with an unsupervised TF-IDF based technique, we can drastically improve the recall of the system, making it appropriate to be used as a recommendation system in the media house environment.

Fri Mar 20 2020
NLP
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
0
0
0
Mon Jan 06 2020
Machine Learning
Semantic Sensitive TF-IDF to Determine Word Relevance in Documents
0
0
0
Sun May 03 2020
NLP
A Two-Stage Masked LM Method for Term Set Expansion
Term Set Expansion (TSE) is of great practical utility, and also of theoretical utility as it requires generalization from few examples. Previous approaches to the TSE task can be characterized as either distributional or pattern-based. We harness the power of neural masked language models (MLM)
0
0
0
Thu Apr 15 2021
NLP
COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List
Classical information retrieval systems such as BM25 rely on exact lexical match and carry out search efficiently with inverted list index. Recent neural IR models shifts towards soft semantic matching all query document terms, but lose the computation efficiency of exact match systems.
3
11
31
Sun Dec 27 2020
Artificial Intelligence
Neural document expansion for ad-hoc information retrieval
Nogueira et al. proposed a new approach to document expansion based on a neural Seq2Seq model. This approach can be effectively adapted to standard IR tasks, where labels are scarce and many long documents are present.
0
0
0
Tue Jun 08 2021
NLP
Neural Extractive Search
We advocate for a search paradigm called ''extractive search'', in which a search query is enriched with capture-slots. Such an extractive search system can be built around syntactic structures, resulting in high-precision, low-recall results.
2
5
26