Published on Mon Oct 21 2019

A Comparison of Semantic Similarity Methods for Maximum Human Interpretability

Pinky Sitikhu, Kritish Pahi, Pujan Thapa, Subarna Shakya

The similarity calculation method that focuses on features related to the text's words only, will give less accurate results. The inclusion of semantic information in any similarity measures improves the efficiency of the similarity measure and provides human interpretable results.

0
0
0
Abstract

The inclusion of semantic information in any similarity measures improves the efficiency of the similarity measure and provides human interpretable results for further analysis. The similarity calculation method that focuses on features related to the text's words only, will give less accurate results. This paper presents three different methods that not only focus on the text's words but also incorporates semantic information of texts in their feature vector and computes semantic similarities. These methods are based on corpus-based and knowledge-based methods, which are: cosine similarity using tf-idf vectors, cosine similarity using word embedding and soft cosine similarity using word embedding. Among these three, cosine similarity using tf-idf vectors performed best in finding similarities between short news texts. The similar texts given by the method are easy to interpret and can be used directly in other information retrieval applications.

Sun Apr 19 2020
NLP
Evolution of Semantic Similarity -- A Survey
Estimating the semantic similarity between text data is one of the most open research problems in the field of Natural Language Processing. Various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods, categorizing them based on their underlying principles.
0
0
0
Wed Feb 17 2016
NLP
A Comprehensive Comparative Study of Word and Sentence Similarity Measures
Sentence similarity is considered the basis of many natural language tasks. This article reviews a set of word and sentence similarity measures and compares them on benchmark datasets. Results showed that hybrid semantic measures perform better than both knowledge and corpus based measures.
0
0
0
Wed Oct 30 2013
NLP
Description and Evaluation of Semantic Similarity Measures Approaches
Semantic similarity measure has a great interest in Semantic Web and Natural Language Processing (NLP) Several similarity measures have been developed, being given the existence of a structured knowledge representation. The aim of this paper is to give an efficient evaluation of all these measures.
0
0
0
Wed Mar 23 2016
NLP
Evaluating semantic models with word-sentence relatedness
Semantic textual similarity (STS) systems are designed to encode and evaluate the semantic similarity between words, phrases, sentences, and documents. One method for assessing the quality or authenticity of semantic informationencoded in these systems is by comparison with human judgments.
0
0
0
Wed Jan 15 2014
NLP
Text Relatedness Based on a Word Thesaurus
A measure of relatedness between text segments must take into account both the lexical and the semantic relatedness. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. The proposed method outperforms every lexicon-based method.
0
0
0
Fri Oct 04 2013
NLP
Semantic Measures for the Comparison of Units of Language, Concepts or Instances from Text and Knowledge Base Analysis
Semantic measures are widely used today to estimate the strength of the relationship between elements of various types. Semantic measures generalize the well-known notions of semantic similarity, semantic relatedness and semantic distance.
0
0
0