Published on Fri Jul 02 2021

Near-optimal Algorithms for Explainable k-Medians and k-Means

Konstantin Makarychev, Liren Shan

We consider the problem of explainable -medians and $k-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020) In this problem,our goal is to find a threshold decision tree that partitions data into two groups.

0
0
0
Abstract

We consider the problem of explainable -medians and -means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a \emph{threshold decision tree} that partitions data into clusters and minimizes the -medians or -means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is competitive with -medians with norm and competitive with -means. This is an improvement over the previous guarantees of and by Dasgupta et al (2020). We also provide a new algorithm which is $O(\log^{3/2} k)$ competitive for -medians with norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of for -medians; in this work, we prove a lower bound of for -means. We also provide a lower bound of for -medians with norm.

Fri Feb 28 2020
Machine Learning
Explainable -Means and -Medians Clustering
Clustering is a popular form of unsupervised learning for geometric data. Many clustering algorithms lead to cluster assignments that are hard to explain. To improve interpretability, we consider using a small decision tree to partition a data set.
0
0
0
Thu Jul 01 2021
Machine Learning
Almost Tight Approximation Algorithms for Explainable Clustering
We study a recent framework of explainable clustering first suggested by Dasgupta et al. We provide an -approximation algorithm for explainable -medians. All our algorithms run in near linear time in the number of points and the dimension.
0
0
0
Wed Jun 30 2021
Machine Learning
Nearly-Tight and Oblivious Algorithms for Explainable Clustering
We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020) A -clustering is said to be explainable if it is given by a decision tree where each internal node splits data
0
0
0
Tue Jan 05 2021
Machine Learning
On the price of explainability for some clustering problems
The price of explainability for a clustering task can be defined as the unavoidable loss if the final partition is explainable. We provide upper and lower bounds for a natural model where explainability is achieved via decision trees.
0
0
0
Tue Jun 29 2021
Machine Learning
Near-Optimal Explainable -Means for All Dimensions
Many clustering algorithms are guided by certain cost functions such as the widely-used -means cost. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian introduced explainable clustering. The central question here is: how much does the explainability constraint increase the
0
0
0
Wed Jun 03 2020
Machine Learning
ExKMC: Expanding Explainable -Means Clustering
Despite the popularity of explainable AI, there is limited work on effective methods for unsupervised learning. We study algorithms for -means clustering, focusing on a trade-off between explainability and accuracy.
0
0
0
Mon Nov 09 2020
Machine Learning
Hardness of Approximation of Euclidean -Median
Euclidean -median problem is defined in the following manner: given a set of points and an integer $k, find a set of centers such that the cost function is minimized. hardness of approximation results are known for the Euclidean -means problem, but no hardness for the $k-means problem. In this work, we provide the first
0
0
0
Thu Nov 08 2018
Machine Learning
Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering
We show that the cost of the optimal solution is preserved up to a factor of under a projection onto a random subspace. For -means, our result resolves an open problem posed by Cohen (STOC 2015)
0
0
0
Thu Oct 13 2011
Machine Learning
Randomized Dimensionality Reduction for k-means Clustering
We study the topic of dimensionality reduction for -means clustering. A feature selection based algorithm selects a small subset of the input features. Two provably accurate feature extraction methods are known in the literature.
0
0
0
Fri Feb 28 2020
Machine Learning
Explainable -Means and -Medians Clustering
Clustering is a popular form of unsupervised learning for geometric data. Many clustering algorithms lead to cluster assignments that are hard to explain. To improve interpretability, we consider using a small decision tree to partition a data set.
0
0
0
Wed Jun 03 2020
Machine Learning
ExKMC: Expanding Explainable -Means Clustering
Despite the popularity of explainable AI, there is limited work on effective methods for unsupervised learning. We study algorithms for -means clustering, focusing on a trade-off between explainability and accuracy.
0
0
0
Tue Jan 05 2021
Machine Learning
On the price of explainability for some clustering problems
The price of explainability for a clustering task can be defined as the unavoidable loss if the final partition is explainable. We provide upper and lower bounds for a natural model where explainability is achieved via decision trees.
0
0
0