Published on Mon Nov 14 2016

Attending to Characters in Neural Sequence Labeling Models

Marek Rei, Gamal K. O. Crichton, Sampo Pyysalo

Sequence labeling architectures use word embeddings for capturing similarity. They suffer when handling previously unseen or rare words. We propose a novel architecture for combining alternative word representations.

0
0
0
Abstract

Sequence labeling architectures use word embeddings for capturing similarity, but suffer when handling previously unseen or rare words. We investigate character-level extensions to such models and propose a novel architecture for combining alternative word representations. By using an attention mechanism, the model is able to dynamically decide how much information to use from a word- or character-level component. We evaluated different architectures on a range of sequence labeling datasets, and character-level extensions were found to improve performance on every benchmark. In addition, the proposed attention-based architecture delivered the best results even with a smaller number of trainable parameters.

Thu Oct 20 2016
NLP
Neural Machine Translation with Characters and Hierarchical Encoding
Most existing Neural Machine Translation models use groups of characters or whole words as their unit of input and output. We propose a model that takes individual characters both as input and output. This hierarchical representation of the character encoder reduces computational complexity.
0
0
0
Sat Aug 11 2018
NLP
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
State-of-the-art machine translation systems are based on encoder-decoder architectures. We propose an alternative approach which relies on a single 2D convolutional neural network. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far.
0
0
0
Sat Nov 14 2015
NLP
Character-based Neural Machine Translation
We show that our model can achieve translation results that are on par with conventional word-based models. As the representation and generation of words is performed at the character level, our model is capable of interpreting and generating unseen word forms.
0
0
0
Mon Oct 29 2018
NLP
Learning Better Internal Structure of Words for Sequence Labeling
Character-based neural models have recently proven very useful for many NLP tasks. However, there is a gap of sophistication between methods for learning representations of sentences and words. We propose IntNet, a funnel-shaped wide convolutional neural architecture with no down-sampling.
0
0
0
Thu Aug 09 2018
Artificial Intelligence
Character-Level Language Modeling with Deeper Self-Attention
LSTMs and other RNN variants have shown strong performance on character-level language modeling. A deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin.
0
0
0
Thu Apr 30 2020
NLP
Character-Level Translation with Self-attention
We explore the suitability of self-attention models for character-level neurological machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convolutions. We perform extensive experiments on WMT and UN datasets.
0
0
0