Published on Tue Jun 07 2016

How is a data-driven approach better than random choice in label space division for multi-label classification?

Piotr Szymański, Tomasz Kajdanowicz, Kristian Kersting

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification. We use modularity-maximizing fastgreedy, leading eigenvector,infomap, walktrap and label propagation algorithms.

0
0
0
Abstract

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss.

Thu Apr 27 2017
Machine Learning
A Network Perspective on Stratification of Multi-Label Data
We present a new approach to stratifying multi-label data. It is based on the iterative stratification approach proposed by Sechidis et. al. The proposed approach lowers the variance of classification quality, improves label pair oriented measures and example distribution.
0
0
0
Sun Dec 13 2020
Machine Learning
Active Learning for Node Classification: The Additional Learning Ability from Unlabelled Nodes
Node classification on graph data is an important task on many practical occasions. It requires labels for training, which can be difficult or expensive to obtain in practice. Given a limited labelling budget, active learning aims to improve performance.
0
0
0
Thu Jun 18 2015
Artificial Intelligence
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
The algorithm is based on divide-and-conquer constraint-based subroutines. It first reconstructs the skeleton of a Bayesian network and then performs a greedy hill-climbing search to orient the edges. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy.
0
0
0
Fri Aug 26 2016
Machine Learning
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes.
0
0
0
Mon Jan 07 2019
Machine Learning
Semi-supervised learning in unbalanced and heterogeneous networks
The idea comes from the first hitting time in random walk. WIL outperforms other state-of-the-art methods in most of our simulations.
0
0
0
Thu Jan 11 2018
Machine Learning
Active Community Detection with Maximal Expected Model Change
We present a novel active learning algorithm for community detection on networks. Our proposed algorithm uses a Maximal Expected Model Change (MEMC)criterion for querying network nodes. MEMC detects nodes that maximally change the community assignment likelihood model.
0
0
0