Published on Mon May 17 2021

Pay Attention to MLPs

Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le
MLPs with gating; works well compared to BERT for NLP and Vision Transformer.
96
1,136
5,464
Abstract

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple attention-free network architecture, gMLP, based solely on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream tasks. On finetuning tasks where gMLP performs worse, making the gMLP model substantially larger can close the gap with Transformers. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.

Mark O. Riedl2017: Attention is all you need 2021: You don’t need attention https://arxiv.org/abs/2105.08050
ステート・オブ・AI ガイド我々はひょっとしたら「トランスフォーマー時代」の終わりを目の当たりにしているかも。 Google Brain から新たに発表された多層パーセプトロン (MLP) にゲート機構を組み合わせた「gMLP」、画像認識とBERT的言語モデルにおいてトランスフォーマーに匹敵する性能を叩き出す https://arxiv.org/abs/2105.08050
ステート・オブ・AI ガイド多層パーセプトロンでトランスフォーマー超えを達成した gMLP https://arxiv.org/abs/2105.08050 先週、発表されるや否や業界に衝撃を与えたばかりですが、PyTorch の実装が既に公開されていて、スピード感がハンパない https://github.com/lucidrains/g-mlp-pytorch GPT 的な言語生成のサンプルコードもあり、普通に動くのがすごい
えるエル12月:「画像認識もTransformer最強(ViT)」 2月:「Transformer is All you Need」 3月:「Attention is not All you Need」 5月:「MLP工夫でViT並(MLPmixer)」 5月:「ConvolutionがTransformerより強そう」 5月:「MLPにゲート付けてTransformer超え(Pay Attention to MLPs)」 https://arxiv.org/abs/2105.08050
Hanxiao LiuPay attention to MLPs: http://arxiv.org/abs/2105.08050 We show that MLPs with gating work well for key applications that Transformers are good at: BERT for NLP and ViT for vision. Is MLP all you need?
AKPay Attention to MLPs pdf: https://arxiv.org/pdf/2105.08050.pdf abs: https://arxiv.org/abs/2105.08050 based solely on MLPs with gating, show that it can perform as well as Transformers in key language and vision applications
Henry AI Labs
AI Weekly Update - May 26th, 2021 (#32!)
Wed May 26 2021 at 5:41:18 PM
2,149
126
Giorgio PatriniIt's harder and harder not getting scooped in ML. This is all concurrent work replacing Transformers & Attention with MLP: ResMLP https://arxiv.org/abs/2105.03404v1 RepMLP https://arxiv.org/abs/2105.01883v1 gMLP https://arxiv.org/abs/2105.08050 Do you even need attention? https://arxiv.org/abs/2105.02723v1
akirahttps://arxiv.org/abs/2105.08050v1 画像分類でVision Transformerは高精度を達成したが、それにSelf-Attentionは重要な要素ではないと主張し、ゲート構造をもったMLPでその精度を超えた。言語タスクにおいても簡易的な単ヘッド注意をMLPに追加することで、BERTを超えられる。
piqcy全結合層のネットワークで画像(Vision Transformer)、言語(BERT)のタスクでTransformerを超える精度を達成したという研究。チャンネル(系列)方向に線形変換=>活性関数(GeLU)=>線形変換の掛け合わせ(系列間のインタラクション計算)を行うブロックを組み合わせて演算する。 https://arxiv.org/abs/2105.08050
MachineLearning[R] Pay Attention to MLPs: solely on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications
Shion HondaPay Attention to MLPs [Liu+, 2021] Transformerにおけるself-attentionの必要性は未解明だった。チャンネル方向と系列方向の線形変換、活性化関数、正規化、要素積の組合わせからなるgMLPを提案し、言語と画像の主要タスクでViTやBERTに匹敵する精度を達成。 https://arxiv.org/abs/2105.08050 #NowReading
John Bohannon@MelMitchell1 Been noticing this too! Fourier is all you need https://arxiv.org/abs/2105.03824 Perceptrons are all you need https://arxiv.org/abs/2105.08050 Just waiting for "Word frequency is all you need" to make its glorious comeback.
rishabh 🤖Paper alert! "Pay Attention to MLPs" by @Hanxiao_6, @ZihangDai, @davidrso1, and @quocleix introduces gMLP, an alternative MLP architecture that aims to remove the dependence on Attention for vision tasks. Check it out here: https://arxiv.org/abs/2105.08050 Really interesting read!
Simone ScardapanePreprint is here: https://arxiv.org/abs/2105.08050v1 Unofficial code: https://github.com/lucidrains/g-mlp-pytorch Apart from MLP-Mixer, you can also check out ResMLP and this technical report: https://arxiv.org/abs/2105.02723v1
Daisuke OkanoharaSelf-attention may not be important in Transformer in most cases. They replace self-attention with gMLP (MLP with Spatial Gating Unit) and show it achieve competitive performance in NLP and image recognition. It scales well over increased data and compute. https://arxiv.org/abs/2105.08050
Lewis Tunstallat this rate, the transformers book i'm working on will have to be called "NLP with MLPs" 😅 https://arxiv.org/abs/2105.08050
AUEB NLP GroupNext online AUEB NLP Group meeting, Tue *June 1st*, 17:00-18:30: Discussion of (i) "Are Pre-trained Convolutions Better than Pre-trained Transformers?" (Tay et al., ACL 2021, and (ii) "Pay Attention to MLPs" (Liu et al.,
TensorFlow Turkey CommunityTransformer modellerinin başarısı self-attention'dan kaynaklanmıyor olabilir mi? Google Research'ün yeni makalesi bu kuşkuyu arttırıyor. gating eklenmiş basit bir MLP varyantı, CV ve NLP görevlerinde Transformer ile karşılaştırılabilir başarı sağlıyor https://arxiv.org/abs/2105.08050
Mark O. Riedl2017: Attention is all you need 2021: You don’t need attention https://arxiv.org/pdf/2105.08050.pdf
deeplearningPay Attention to MLPs - Annotated PyTorch implementation
akirahttps://arxiv.org/abs/2105.08050v1 Vision Transformer achieved high accuracy in image classification, but Self-Attention is not an important factor in it, this paper argues, and proposed an MLP with a gate structure to surpass its accuracy.
注目の最新arXiv【毎日更新】2021/05/17 投稿 1位 LG(Machine Learning) Pay Attention to MLPs https://arxiv.org/abs/2105.08050 12 Tweets 60 Retweets 370 Favorites
Tomonari MASADAこれが噂の。2.1の"SGU learns only a single transformation shared across channels."は、それでOKなのか、という感じ。spatial filtersが実際どうなっているかの可視化も興味深い。 https://arxiv.org/abs/2105.08050
Nasim Rahaman@josephdviviano The part that I find somewhat surprising is that they don’t really *want* to work without a pinch of attention, or that’s my read from
Seitaro ShinagawaPay Attention to MLPsの図4の読み方わからん https://arxiv.org/abs/2105.08050 "For each layer in the model we plot the row in W associated with the token in the middle of the sequence."とあるけど、各層ごとに系列の真ん中に強く反応するWの行方向の重みを恣意的に選んで可視化してるのかな?
censored結局attention強いねという話だった https://arxiv.org/abs/2105.08050
たむhttps://arxiv.org/abs/2105.08050 Google BrainのgMLPほんとに単純な構造なのにTransformer超えてるらしくてすごい。MLPは任意の関数を表せるので、自己注意いらないのでは?と言っていて本当にいらなかったらしい。
ふるやんタイトルでうまいこと言うゲームだ "Pay Attention to MLPs" https://arxiv.org/abs/2105.08050
邪悪なポメラニアン自分もAttention最強!Transformerベースがデファクト!って思ってた。Google Brainが「ちょ、まてよ」って論文を5月に出してた https://arxiv.org/abs/2105.08050
いわもPay Attention to MLPs https://arxiv.org/abs/2105.08050v1 "Our comparisons show that self-attention is not critical for Vision Transformers " 許してくれ〜
.Those perceptronization attempts of transfomers reminds me of Zima Blue. https://arxiv.org/pdf/2105.08050.pdf
川村正春 @ 五城目人工知能アカデミーPay Attention to MLPs gMLP MLPs with gating self-attention is not critical for Vision Transformers For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream tasks https://arxiv.org/abs/2105.08050 https://www.slideshare.net/DeepLearningJP2016/dlpay-attention-to-mlpsgmlp
/r/ML Popular[R] Pay Attention to MLPs: solely on MLPs with gating, and show that it can perform as well as Transformers in ke... https://arxiv.org/abs/2105.08050
mandubianhotepPay Attention to MLPs Interesting paper showing that basic MLP can catch sequential & spatial relations as well as self-attention. Self-Attention sounds to me as an engineering trick more than a theoretical proven tool. Still things to discover... https://arxiv.org/abs/2105.08050
Thread Reader App@SuhnyllaKler Salam, here is your unroll: Pay attention to MLPs: http://arxiv.org/abs/2105.08050 We show that MLPs… https://threadreaderapp.com/thread/1394742841033641985.html Talk to you soon. 🤖
Yuji Tokuda著者NASの人や / [2105.08050] Pay Attention to MLPs https://arxiv.org/abs/2105.08050
ぴーちゃまgMLP 多層パープトロンでself-attention機構を持つ様なTransformerと同等性能を達成したという主張。Spatial Gating Unitを導入することがキモと主張してるけど、単純にビックデータぶち込んだから上手くいった、という話なんじゃないの?という疑問を個人的には拭えない https://arxiv.org/abs/2105.08050
RICKY[2105.01601] MLP-Mixer: An all-MLP Architecture for Vision https://arxiv.org/abs/2105.01601 [2105.08050] Pay Attention to MLPs https://arxiv.org/abs/2105.08050
hoimiまだまだ過渡期ということか....。 https://arxiv.org/abs/2105.08050
WOOSUNG CHOI@KoeKestra You mean this paper? https://arxiv.org/pdf/2105.08050.pdf It's amazing! You are an early-adopter. I guess you submitted a small model to fit the time limit. I am wondering the SDR score of larger gMLP-based models, if it can meet the time limit.
cs.LG PapersPay Attention to MLPs. Hanxiao Liu, Zihang Dai, David R. So, and Quoc V. Le http://arxiv.org/abs/2105.08050
arXiv reaDer bot (cs-CV)Pay Attention to MLPs MLPに注意を払う 2021-05-17T17:55:04+00:00 arXiv: http://arxiv.org/abs/2105.08050v1 英/日サマリ↓ https://arxiv-check-250201.firebaseapp.com/each/2105.08050v1
cs.CV PapersPay Attention to MLPs. Hanxiao Liu, Zihang Dai, David R. So, and Quoc V. Le http://arxiv.org/abs/2105.08050
風凪空@幻想邪神(幻月の夫)Pay Attention to MLPs https://arxiv.org/abs/2105.08050 >Google Brain から新たに発表された多層パーセプトロン (MLP) にゲート機構を組み合わせた「gMLP」、画像認識とBERT的言語モデルにおいてトランスフォーマーに匹敵する性能を叩き出す もしかしてTransformer時代が終わろうとしているのだろうか
arXiv cs.CL 自動翻訳Pay Attention to MLPs http://arxiv.org/abs/2105.08050v1 トランスフォーマーは、ディープラーニングにおける最も重要なアーキテクチャの革新の1つになり、過去数年間で多くのブレークスルーを可能にしました。ここでは、ゲーティングを備えたMLPのみに基づく単純な注意のないネットワークアーキテ
午後のarXiv"Pay Attention to MLPs", Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le https://arxiv.org/abs/2105.08050
Mesum Raza Hemaniphilipvollet: RT @ak92501: Pay Attention to MLPs pdf: https://arxiv.org/pdf/2105.08050.pdf abs: https://arxiv.org/abs/2105.08050 based solely on MLPs with gating, show that it can perform as well as Transformers in key language and vision applications
Mesum Raza Hemaniphilipvollet: RT @Hanxiao_6: Pay attention to MLPs: http://arxiv.org/abs/2105.08050 We show that MLPs with gating work well for key applications that Transformers are good at: BERT for NLP and ViT for vision. Is MLP all you need?