Published on Thu Mar 11 2021

Evaluation of Morphological Embeddings for the Russian Language

Vitaly Romanov, Albina Khusainova
0
0
0
Abstract

A number of morphology-based word embedding models were introduced in recent years. However, their evaluation was mostly limited to English, which is known to be a morphologically simple language. In this paper, we explore whether and to what extent incorporating morphology into word embeddings improves performance on downstream NLP tasks, in the case of morphologically rich Russian language. NLP tasks of our choice are POS tagging, Chunking, and NER -- for Russian language, all can be mostly solved using only morphology without understanding the semantics of words. Our experiments show that morphology-based embeddings trained with Skipgram objective do not outperform existing embedding model -- FastText. Moreover, a more complex, but morphology unaware model, BERT, allows to achieve significantly greater performance on the tasks that presumably require understanding of a word's morphology.

Thu Mar 11 2021
NLP
Evaluation of Morphological Embeddings for English and Russian Languages
0
0
0
Thu Feb 13 2020
NLP
Comparison of Turkish Word Representations Trained on Different Morphological Forms
For morphologically rich languages, context-free word vectors ignore morphological structure of languages. To see the effect of this, we trained word2vec model on texts which lemma and suffixes are treated differently. We also trained subword model fastText and compared the
0
0
0
Wed Jun 08 2016
NLP
A Joint Model for Word Embedding and Word Morphology
This paper presents a joint model for performing unsupervised morphological analysis on words. Our model splits individual words into segments, and weights each segment according to its ability to predict context words.
0
0
0
Wed Oct 21 2020
NLP
LemMED: Fast and Effective Neural Morphological Analysis with Short Context Windows
LemMED is a character-level encoder-decoder for contextual morphological analysis. LemMED is named after two other attention-based models, namely Lematus, a contextual lemmatizer, and MED, a morphological (re)inflection model.
0
0
0
Mon Aug 19 2019
NLP
UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
We present our contribution to the SIGMORPHON 2019 Shared Task:Crosslinguality and Context in Morphology, Task 2: contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0.
0
0
0
Fri Jun 05 2020
NLP
UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings
We present our contribution to the EvaLatin shared task, which is the first evaluations campaign devoted to the evaluation of NLP tools for Latin. Our system places first by a wide margin both in lemmatization and POS tagging in the open modality, where additional supervised data is allowed.
0
0
0