Published on Tue Jun 04 2019

SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference

Martin Schmitt, Hinrich Schütze

We present SherLIiC, a testbed for lexical inference in context. It consists of 3985 manually annotated inference rule candidates (InfCands) Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders.

0
0
0
Abstract

We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders, each linked to one or more Freebase types. Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than existing testbeds because distributional evidence is of little utility in the classification of InfCands. We also show that, due to its construction, many of SherLIiC's correct InfCands are novel and missing from existing rule bases. We evaluate a number of strong baselines on SherLIiC, ranging from semantic vector space models to state of the art neural models of natural language inference (NLI). We show that SherLIiC poses a tough challenge to existing NLI systems.

Fri Aug 21 2015
NLP
A large annotated corpus for learning natural language inference
The Stanford Natural Language Inference corpus is two orders of magnitude larger than other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models.
0
0
0
Wed Feb 10 2021
NLP
Language Models for Lexical Inference in Context
Lexical inference in context (LIiC) is a variant of the natural language inference task that is focused on lexical semantics. We formulate andevaluate the first approaches based on pretrained language models. All our approaches outperform the previous state of the art, showing the potential of
1
6
19
Mon Mar 16 2020
NLP
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
0
0
0
Mon Mar 27 2017
NLP
A Tidy Data Model for Natural Language Processing using cleanNLP
The package cleanNLP provides a set of fast tools for converting a textual corpus into normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library. CleanNLP is available in English, French, German, and Spanish.
0
0
0
Fri Mar 22 2019
NLP
LINSPECTOR: Multilingual Probing Tasks for Word Representations
There is a lack of a standardized technique to provide insights into what is captured by word representation models. We introduce 15 type-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We find that a number of probing tests have significantly high positive correlation to the downstream tasks.
0
0
0
Tue Apr 18 2017
NLP
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Multi-Genre Natural Language Inference (MultiNLI) is a dataset designed for use in the development and evaluation of machine learning models. At 433k examples, this corpus improves upon available resources in its coverage. It offers data from ten distinct genres of written and spoken English.
0
0
0