Published on Wed May 29 2019

Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

Chang Li, Maarten de Rijke

Non-stationarity appears in many online applications such as web search and advertising. We propose two algorithms for solving this non-stationary problem. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms.

0
0
0
Abstract

Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.

Tue Feb 10 2015
Machine Learning
Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits.
0
0
0
Tue Mar 07 2017
Machine Learning
Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval andmachine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. In this work, we propose BatchRank, the first online learning toRank algorithm for a broad class of click model.
0
0
0
Tue Feb 09 2016
Machine Learning
DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user examines this list, from the first page to the last, and clicks on all attractive pages until the user is satisfied. We propose DCM bandits, an online learning variant of the click model.
0
0
0
Thu Sep 12 2019
Machine Learning
Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits
Cascading bandit (CB) is a popular model for web search and online advertising. The CB model may be too simple to apply to real-world problems, where user preferences may change over time. Two efficient algorithms are developed to ensure regret upper bounds.
0
0
0
Wed Jun 06 2018
Machine Learning
TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decision-making problem. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model. We propose a generalized click model that encompasses many existing models.
0
0
0
Sun Mar 19 2017
Machine Learning
Bernoulli Rank- Bandits for Click Feedback
The probability that a user will click a search result depends both on its relevance and its position on the results page. We propose the learning problem of a Bernoulli rank- bandit where at each step, the learning agent chooses a pair of row and column arms, and receives the product of their Bernoully-distributed values as a reward.
0
0
0