Published on Sun Jun 10 2018

Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

Nurendra Choudhary, Rajat Singh, Manish Shrivastava

Neural network models have shown promising results for text classification. However, these solutions are limited by their dependence on the availability of annotated data. The performance on resource-poor languages can significantly improve if the resource availability constraints can be offset.

0
0
0
Abstract

Neural network models have shown promising results for text classification. However, these solutions are limited by their dependence on the availability of annotated data. The prospect of leveraging resource-rich languages to enhance the text classification of resource-poor languages is fascinating. The performance on resource-poor languages can significantly improve if the resource availability constraints can be offset. To this end, we present a twin Bidirectional Long Short Term Memory (Bi-LSTM) network with shared parameters consolidated by a contrastive loss function (based on a similarity metric). The model learns the representation of resource-poor and resource-rich sentences in a common space by using the similarity between their assigned annotation tags. Hence, the model projects sentences with similar tags closer and those with different tags farther from each other. We evaluated our model on the classification tasks of sentiment analysis and emoji prediction for resource-poor languages - Hindi and Telugu and resource-rich languages - English and Spanish. Our model significantly outperforms the state-of-the-art approaches in both the tasks across all metrics.

Tue Apr 03 2018
NLP
Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks
Siamese Network Architecture for Sentiment Analysis (SNASA) uses a siamese network to learn representations of resource-poor languages. SNASA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks.
0
0
0
Tue Apr 03 2018
NLP
Contrastive Learning of Emoji-based Representations for Resource-Poor Languages
The introduction of emojis (or emoticons) in social media platforms has given users an increased potential for expression. We propose a novel method called Classification of Emojis using Siamese Network Architecture (CESNA) to learn emoji-based representations of resource-poor languages.
0
0
0
Fri Nov 27 2015
NLP
A C-LSTM Neural Network for Text Classification
C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. It can outperform both CNN and LSTM and can achieve excellent performance on these tasks. We evaluate the proposed model on sentiment classification and question classification tasks.
0
0
0
Sun Jun 23 2019
NLP
Cross-lingual Data Transformation and Combination for Text Classification
Cross-lingual data sources may suffer from data incompatibility. Machine translation and word embedding alignment provide an effective way to transform and combine data. Monolingual models were trained from English and French alongside their translated and aligned embeddings.
0
0
0
Mon May 24 2021
NLP
Cross-lingual Text Classification with Heterogeneous Graph Neural Network
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages. Recent multilingual pretrained language models rarely consider factors beyond semantic similarity, causing performance degradation between some language pairs.
3
0
1
Mon Nov 07 2016
NLP
AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification
We propose a novel framework called AC-BLSTM for modeling sentences and documents. The framework combines the asymmetric convolution neural network (ACNN) with the Bidirectional Long Short-Term Memory network (BLstM) Experiment results demonstrate that our model achieves state-of-
0
0
0