Published on Wed Oct 03 2018

A Nonparametric Approach to High-dimensional k-sample Comparison Problems

Subhadeep, Mukhopadhyay, Kaijun Wang

High-dimensional k-sample comparison is a common applied problem. We construct a class of easy-to-implement nonparametric distribution-free tests. The method works surprisingly well under a broad range of realistic situations.

0
0
0
Abstract

High-dimensional k-sample comparison is a common applied problem. We construct a class of easy-to-implement nonparametric distribution-free tests based on new tools and unexplored connections with spectral graph theory. The test is shown to possess various desirable properties along with a characteristic exploratory flavor that has practical consequences. The numerical examples show that our method works surprisingly well under a broad range of realistic situations.

Wed Jan 02 2019
Machine Learning
An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation
Information theory plays an indispensable role in the development of algorithm-independent impossibility results. We provide a survey of Fano's inequality and its variants in the context of statistical estimation. We present a variety of key tools and techniques used for establishing impossibility results via this approach.
0
0
0
Mon Jul 08 2013
Machine Learning
B-tests: Low Variance Kernel Two-Sample Tests
A family of maximum mean discrepancy (MMD) kernel two-sample tests. Members of the test family are called Block-tests or B-tests. The choice of block size allows control over the tradeoff between test power and computation time.
0
0
0
Tue Dec 15 2020
Machine Learning
Spectral Methods for Data Science: A Statistical Perspective
Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but frequently employed to initialize other more sophisticated algorithms.
0
0
0
Fri Jun 26 2020
Machine Learning
The huge Package for High-dimensional Undirected Graph Estimation in R
We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007, 2012) and Liu and Liu (2010)
0
0
0
Wed Nov 02 2011
Machine Learning
Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters. For the most part, such flexibility is lacking in classical clustering methods such as k-means.
0
0
0
Thu May 15 2008
Artificial Intelligence
A Kernel Method for the Two-Sample Problem
We propose a framework for analyzing and comparing distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS)
0
0
0