Published on Tue Nov 17 2020

A Quantitative Perspective on Values of Domain Knowledge for Machine Learning

Jianyi Yang, Shaolei Ren
0
0
0
Abstract

With the exploding popularity of machine learning, domain knowledge in various forms has been playing a crucial role in improving the learning performance, especially when training data is limited. Nonetheless, there is little understanding of to what extent domain knowledge can affect a machine learning task from a quantitative perspective. To increase the transparency and rigorously explain the role of domain knowledge in machine learning, we study the problem of quantifying the values of domain knowledge in terms of its contribution to the learning performance in the context of informed machine learning. We propose a quantification method based on Shapley value that fairly attributes the overall learning performance improvement to different domain knowledge. We also present Monte-Carlo sampling to approximate the fair value of domain knowledge with a polynomial time complexity. We run experiments of injecting symbolic domain knowledge into semi-supervised learning tasks on both MNIST and CIFAR10 datasets, providing quantitative values of different symbolic knowledge and rigorously explaining how it affects the machine learning performance in terms of test accuracy.

Fri Mar 29 2019
Artificial Intelligence
Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems
Machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches.
0
0
0
Tue Jul 13 2021
Machine Learning
On Designing Good Representation Learning Models
The goal of representation learning is different from the ultimate objective of machine learning such as decision making. It is difficult to establish clear and direct objectives for training representation learning models. We propose to train a model by maximizing its expressiveness while at the same time incorporating general priors such as model smoothness.
0
0
0
Mon Sep 16 2019
Artificial Intelligence
RuDaS: Synthetic Datasets for Rule Learning and Evaluation Tools
Logical rules are a popular knowledge representation language in many domains. They represent background knowledge and encoding information that can be derived from given facts in a compact form. Rule formulation is a complex process that requires deep domain expertise.
0
0
0
Mon Dec 21 2020
Machine Learning
Knowledge as Invariance -- History and Perspectives of Knowledge-augmented Machine Learning
Research in machine learning is at a turning point, according to a new white paper. Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks, the paper says. Instead, works are focused on developing models that by themselves guarantee a certain degree of versatility and invariance.
0
0
0
Thu Jul 02 2020
Machine Learning
In Search of Lost Domain Generalization
The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of DomainBed testbeds exist, inconsistencies in experimental conditions render fair and realistic comparisons difficult.
1
1
4
Thu Sep 09 2021
Machine Learning
COLUMBUS: Automated Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption
Machine learning models that can generalize to unseen domains are essential when applied in real-world scenarios. The main challenge of DG is that the features learned from the source domains are not necessarily present in the unseen target domains. We propose COLUMBUS, a method that enforces new feature discovery.
0
0
0