Published on Mon Aug 02 2021

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

Weidong Guo, Mingjun Zhao, Lusheng Zhang, Di Niu, Jinwen Luo, Zhenhua Liu, Zhenyang Li, Jianbo Tang

language models, such as BERT, are trained based on single-grained tokenization. We propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi- grained information of input text. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that

2
0
2
Abstract

Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and phrases. In this paper, we propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference cost incurred, and that our best ensemble model achieves the state-of-the-art performance on CLUE benchmark competition.

Fri Oct 23 2020
Machine Learning
ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
ERNIE-Gram is an explicitly n-gram masking method. It is used to enhance the integration of coarse-grained information into pre-training. It outperforms XLNet and RoBERTa models by a large margin. The source codes andpre-trained models have been released.
0
0
0
Mon Jul 29 2019
NLP
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Pre-trained models have achieved state-of-the-art results in various language understanding tasks. This indicates that pre-training on large-scale corpora may play a crucial role in natural language processing.
1
0
0
Thu Aug 27 2020
NLP
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
Pre-trained language models such as BERT have exhibited remarkable performances in many tasks in natural language understanding (NLU) The tokens in the models are usually fine-grained in the sense that for languages like English they are words or sub-words. In English, for example, there are multi-word expressions which form natural lexical units.
7
1
8
Sat Aug 31 2019
NLP
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Neural contextualiZed representation for CHinese lAnguage understanding. The current version of NEZHA is based on BERT with a collection of proven improvements.
0
0
0
Thu Oct 15 2020
Machine Learning
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach
Fine-tuned pre-trained language models have achieved enormous success. But they still require excessive labeled data in the fine-tuning stage. This problem is challenging because the high capacity of LMs makes themprone to overfitting the noisy labels generated by weak supervision.
0
0
0
Tue Mar 03 2020
NLP
CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
Chinese corpus from CLUE organization can be used directly for self-supervised learning. It has 100G raw corpus with 35 billion Chinese characters, which is retrieved from Common Crawl. We release a new Chinese vocabulary with a size of 8K.
0
0
0
Mon Jun 12 2017
NLP
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms. Experiments on two machine translation tasks show these models to be superior in
51
215
883
Thu May 28 2020
NLP
Language Models are Few-Shot Learners
GPT-3 is an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model. It can perform tasks without gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.
20
21
235
Thu Oct 11 2018
NLP
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT is designed to pre-train deep                bidirectional representations from unlabeled text. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
13
8
15
Thu May 02 2019
Artificial Intelligence
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
SuperGLUE is a new benchmark styled after GLUE with a new set of more difficult language understanding tasks. The benchmark is available at super.gluebenchmark.com. SuperGLUE has a software toolkit, and a public leaderboard.
1
6
11
Fri Apr 20 2018
NLP
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
The General Language Understanding Evaluation benchmark (GLUE) is a tool for evaluating and analyzing the performance of models across a diverse range of NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks.
1
6
11
Mon Apr 13 2020
Machine Learning
CLUE: A Chinese Language Understanding Evaluation Benchmark
Chinese Language Understanding Evaluation (CLUE) benchmark is the first large-scale Chinese language understanding benchmark. It brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text.
2
6
11