Published on Wed Mar 10 2021

Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance

Joshua R. Minot, Nicholas Cheney, Marc Maier, Danne C. Elbers, Christopher M. Danforth, Peter Sheridan Dodds
0
0
0
Abstract

Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models -- statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how word choices made by healthcare practitioners and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce the potential for bias in natural language processing pipelines.

Tue May 25 2021
NLP
Estimating Redundancy in Clinical Text
Clinicians often populate new documents by duplicating existing notes. Data duplication can lead to a propagation of errors, inconsistencies and misreporting of care. quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives.
0
0
0
Sun Jan 27 2019
Machine Learning
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
We present a large-scale study of gender bias in occupation classification. We analyze the potential allocation harms that can result from semantic representation bias. We quantify the bias that remains when these indicators are scrubbed.
0
0
0
Fri Jan 03 2014
NLP
Natural Language Processing in Biomedicine: A Unified System Architecture Overview
Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view.
0
0
0
Fri Jun 21 2019
NLP
Mitigating Gender Bias in Natural Language Processing: Literature Review
Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity. It becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent.
0
0
0
Fri May 01 2020
NLP
Multi-Dimensional Gender Bias Classification
Machine learning models are trained to find patterns in data. NLP models can mistakenly learn socially undesirable patterns when training on gender biased text. Distinguishing between gender bias along multiple dimensions is important, as it enables us to train gender bias classifiers.
0
0
0
Tue Aug 03 2021
NLP
Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management
Study introduces dataset for assessing bias in medical QA in the context of pain management. Findings reaffirm the risks posed by AI in medical settings. Need for datasets like ours to ensure safety before medical AI applications are deployed.
3
4
41