Published on Thu Jun 03 2021

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

Wasi Uddin Ahmad, Haoran Li, Kai-Wei Chang, Yashar Mehdad

Pre-trained multilingual encoders, such as mBERT, capture language syntax, helping cross-lingual transfer. We perform rigorous experiments on four NLP tasks, including text classification, question answering, named entity recognition, and task-oriented semantic parsing.

1
0
3
Abstract

In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g., syntactic dependencies, can bridge the typological gap. Previous works have shown that pre-trained multilingual encoders, such as mBERT \cite{devlin-etal-2019-bert}, capture language syntax, helping cross-lingual transfer. This work shows that explicitly providing language syntax and training mBERT using an auxiliary objective to encode the universal dependency tree structure helps cross-lingual transfer. We perform rigorous experiments on four NLP tasks, including text classification, question answering, named entity recognition, and task-oriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4 and 1.6 points on average across all languages. In the \emph{generalized} transfer setting, the performance boosted significantly, with 3.9 and 3.1 points on average in PAWS-X and MLQA.

Thu Apr 30 2020
NLP
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
The main goal behind state-of-the-art pre-trained multilingual models such as XLM-R is enabling and bootstrapping NLP applications in low-resource languages. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient
0
0
0
Thu Oct 15 2020
Artificial Intelligence
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
AMBER is trained on additional parallel data using two explicit alignment objectives. It obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
0
0
0
Thu Oct 10 2019
NLP
Multilingual Question Answering from Formatted Text applied to Conversational Agents
BERT outperforms the best previously known baseline for transfer to Japanese and French. We finally present a practical application: a multilingual conversationsational agent called Kate. Kate answers to HR-related questions in several languages from the content of intranet pages.
0
0
0
Tue Sep 03 2019
NLP
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages.
0
0
0
Thu Sep 10 2020
NLP
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Large-scale cross-lingual language models (LM) have achieved great success. Most existing LM methods use only single-language input for LM finetuning. We propose FILTER, an enhanced fusion method that takes cross-ledual data as input.
0
0
0
Fri May 01 2020
NLP
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
Massively multilingual transformers pretrained with language modeling objectives (e.g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP. We show that they are less effective in resource-
0
0
0
Mon Jun 12 2017
NLP
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on two machine translation tasks show these models to be superior in
51
215
883
Mon Oct 30 2017
Machine Learning
Graph Attention Networks
Graph attention networks (GATs) are novel neural network architectures that operate on graph-structured data. GATs leverage masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions.
5
99
432
Sat May 09 2020
NLP
Finding Universal Grammatical Relations in Multilingual BERT
Multilingual BERT (mBERT) is capable of zero-shot cross-lingual transfer. We show that subspaces of mBERT representations are approximately shared across languages. This suggests that even without explicit supervision, multilingual masked language models learn certain linguistic universals.
2
5
21
Thu Oct 11 2018
NLP
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT is designed to pre-train deep                bidirectional representations from unlabeled text. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
13
8
15
Tue Nov 05 2019
NLP
Unsupervised Cross-lingual Representation Learning at Scale
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundredanguages, using more than two terabytes of filtered CommonCrawl data.
1
0
2
Tue Mar 24 2020
Machine Learning
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
The XTREME benchmark is a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We release the benchmark to encourage research on cross- linguistic learning methods that transfer knowledge across a diverse and representative set of languages and tasks.
1
0
1