Published on Fri Jun 04 2021

Dutch Named Entity Recognition and De-identification Methods for the Human Resource Domain

Chaïm van Toledo, Friso van Dijk, Marco Spruit

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them beingonymisation. In this paper, we evaluate the current Dutch text-de-identification

0
0
0
Abstract

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them anonymisation. In this paper, we evaluate the current Dutch text de-identification methods for the HR domain in four steps. First, by updating one of these methods with the latest named entity recognition (NER) models. The result is that the NER model based on the CoNLL 2002 corpus in combination with the BERTje transformer give the best combination for suppressing persons (recall 0.94) and locations (recall 0.82). For suppressing gender, DEDUCE is performing best (recall 0.53). Second NER evaluation is based on both strict de-identification of entities (a person must be suppressed as a person) and third evaluation on a loose sense of de-identification (no matter what how a person is suppressed, as long it is suppressed). In the fourth and last step a new kind of NER dataset is tested for recognising job titles in texts.

Tue Sep 24 2013
NLP
JRC-Names: A freely available, highly multilingual named entity resource
This paper describes a new, freely available, highly multilingual named purposefullyentity resource for person and organisation names. It has been compiled over seven years of large-scale multilingual news analysis combined with Wikipedia mining. This resource can be used for a number of purposes.
0
0
0
Thu Oct 29 2020
NLP
RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain
We show-case an application of information extraction methods to a novel corpus. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English.
0
0
0
Sun May 04 2014
NLP
"Translation can't change a name": Using Multilingual Data for Named Entity Recognition
Named Entities (NEs) are often written with no orthographic changes across languages that share a common alphabet. We show that this can be leveraged so as to improve named entity recognition (NER) by using unsupervised word clusters from secondary languages.
0
0
0
Sun Oct 06 2019
NLP
Named Entity Recognition -- Is there a glass ceiling?
Study reveals weak and strong points of the Stanford,CMU, FLAIR, ELMO and BERT models. We also introduce new techniques for improving annotation.
0
0
0
Sat Aug 03 2013
NLP
A Comparison of Named Entity Recognition Tools Applied to Biographical Texts
Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe are used in the study. Stanford NER has the best results, followed by LingPipes and Illionois. However, their performances are diversely influenced by those factors.
0
0
0
Fri Oct 23 2020
NLP
UNER: Universal Named-Entity RecognitionFramework
The SETimesparallel corpus will be annotated using existing tools and knowledge bases. The resulting annotations will be propagated automatically to other languages within the SE-Times corpora.
0
0
0
Thu Oct 11 2018
NLP
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT is designed to pre-train deep                bidirectional representations from unlabeled text. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
14
8
15
Tue Oct 14 2014
Machine Learning
POLYGLOT-NER: Massive Multilingual Named Entity Recognition
The increasing diversity of languages used on the web introduces a new level of complexity to Information Retrieval (IR) systems. In this paper, we demonstrate how to build massive multilingual annotators with minimal human expertise and intervention.
0
0
0
Sat Dec 22 2018
NLP
A Survey on Deep Learning for Named Entity Recognition
Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation.
0
0
0
Tue Dec 01 2015
NLP
Multilingual Language Processing From Bytes
We describe an LSTM-based model which we call Byte-to-Span (BTS) BTS reads text as bytes and outputs span annotations of the form [start, length, label]
0
0
0
Thu Sep 05 2002
NLP
Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition
0
0
0
Fri Mar 04 2016
NLP
Neural Architectures for Named Entity Recognition
State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge. In this paper, we introduce two new neural architectures based on bidirectional LSTMs and conditional random fields.
0
0
0