Published on Thu Jun 14 2018

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

Cencheng Shen, Joshua T. Vogelstein

Distance-based tests are leading methods for two-sample and independence tests from the statistics community. A fixed-point transformation was previously proposed to connect distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels.

0
0
0
Abstract

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample and independence tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

Wed May 02 2012
Machine Learning
Hypothesis testing using pairwise distances and associated kernels (with Appendix)
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing. The energy distance most commonly employed in statistics is just one member of aparametric family of kernels. We show that other choices from this family can yield more powerful tests.
0
0
0
Wed Jul 25 2012
Machine Learning
Equivalence of distance-based and RKHS-based statistics in hypothesis testing
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing. The energy distance most commonly employed in statistics is just one member of a parametric family of kernels. We show that other choices from this family can yield more powerful tests.
1
0
1
Mon Sep 30 2019
Machine Learning
A New Framework for Distance and Kernel-based Metrics in High Dimensions
The paper presents new metrics to quantify and test for (i) the equality of distributions and (ii) the independence between two high-dimensional random vectors. We show that the energy distance based on the usual Euclidean distance cannot completely characterize the homogeneity of two
0
0
0
Tue Mar 08 2011
Machine Learning
A Gentle Introduction to the Kernel Distance
The kernel distance is an L_2 distance between probability measures or various shapes embedded in a vector space. This structure enables several elegant and efficient solutions to data analysis problems. We conclude with a glimpse into the mathematical underpinnings of this measure.
0
0
0
Mon Jun 09 2014
Machine Learning
On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions
This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings.
0
0
0
Thu Sep 19 2019
Machine Learning
Comparing distributions: geometry improves kernel two-sample testing
Kernel methods lead to many appealing properties. The tests are consistent, while much faster than state-of-the-art. Experiments on artificial and real-world problems demonstrate improved power/time tradeoff than the state of the art.
0
0
0