Published on Wed Sep 09 2015

Statistical Inference, Learning and Models in Big Data

Beate Franke, Jean-François Plante, Ribana Roscher, Annie Lee, Cathal Smyth, Armin Hatefi, Fuqi Chen, Einat Gil, Alexander Schwing, Alessandro Selvitella, Michael M. Hoffman, Roger Grosse, Dieter Hendricks, Nancy Reid

The need for new methods to deal with big data is a common theme in most scientific fields. This paper gives an overview of the topics covered, describing challenges and strategies that seem commonplace to many different areas of application.

0
0
0
Abstract

The need for new methods to deal with big data is a common theme in most scientific fields, although its definition tends to vary with the context. Statistical ideas are an essential part of this, and as a partial response, a thematic program on statistical inference, learning, and models in big data was held in 2015 in Canada, under the general direction of the Canadian Statistical Sciences Institute, with major funding from, and most activities located at, the Fields Institute for Research in Mathematical Sciences. This paper gives an overview of the topics covered, describing challenges and strategies that seem common to many different areas of application, and including some examples of applications to make these challenges and strategies more concrete.

Mon Oct 22 2018
Machine Learning
Model Selection Techniques -- An Overview
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data. A crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction.
0
0
0
Fri Nov 09 2018
Machine Learning
A Bayesian Perspective of Statistical Machine Learning for Big Data
Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets. We argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm.
0
0
0
Mon Nov 24 2014
Machine Learning
Big Learning with Bayesian Methods
Big learning is an emerging subfield that studies machine learning algorithms, systems, and applications with Big Data. Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning. This article provides a survey of the recent advances in Bayesian methods.
0
0
0
Sat Jan 03 2015
Machine Learning
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Big data comes in various ways, types, shapes, forms and sizes. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in. Large p small n data sets for instance require a different set of tools from the large n small pvariety.
0
0
0
Mon May 26 2014
Machine Learning
Statistique et Big Data Analytics; Volum\'etrie, L'Attaque des Clones
This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning. After a quick overview of the different strategies available, the algorithms of some available learning methods are outlined.
0
0
0
Fri Mar 02 2018
Machine Learning
Impact of Biases in Big Data
Bias occurs in machine learning whenever the distributions of the training set and test set are different. We provide definitions and discussions of the most commonly appearing biases in machine learning. We also show how these biases can be quantified and corrected.
0
0
0