Published on Sat Apr 13 2013

Identification of relevant subtypes via preweighted sparse clustering

Sheila Gaynor, Eric Bair

Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis to identify biologically interesting subgroups. Conventional clustering methods generally do not identify such subgroups, particularly when there are a large number of high-variance

0
0
0
Abstract

Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods generally do not identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest. A modification of sparse clustering can be used to identify such secondary clusters or clusters associated with an outcome of interest. This method correctly identifies such clusters of interest in several simulation scenarios. The method is also applied to a large prospective cohort study of temporomandibular disorders and a leukemia microarray data set.

Fri Jul 11 2014
Machine Learning
Biclustering Via Sparse Clustering
In many situations it is desirable to identify clusters that differ with respect to only a subset of features. Such clusters may represent homogeneous subgroups of patients with a disease. We propose a general framework for biclustering based on the sparse clustering method of Witten and
0
0
0
Sun Jan 29 2012
Machine Learning
A robust and sparse K-means clustering algorithm
In many situations where the interest lies in identifying clusters one might expect that not all available variables carry information about these groups. Data quality (e.g. outliers or missing entries) might present a serious and sometimes hard-to-assess problem for large and complex datasets.
0
0
0
Thu Sep 26 2019
Machine Learning
CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering
Feature selection is an important and challenging task in high dimensional clustering. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes. In this paper, we propose a K-means based clustering algorithm that discovers informative features.
0
0
0
Thu Feb 21 2008
Machine Learning
Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables
Clustering analysis is one of the most widely used statistical tools in microarray data analysis. The presence of many noise variables may mask underlying structures. This article introduces a novel approach that shrinks the variances together with means.
0
0
0
Thu Nov 03 2016
Machine Learning
High-dimensional regression over disease subgroups
We consider high-dimensional regression over subgroups of observations. Our approach is to treat subgroups as related problem instances and estimate subgroup-specific regression coefficients. We present algorithms for estimation and empirical results on simulated data.
0
0
0
Thu Feb 28 2013
Machine Learning
Bayesian Consensus Clustering
The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source.
0
0
0