Published on Wed Jan 23 2019

Thompson Sampling for a Fatigue-aware Online Recommendation System

Yunjuan Wang, Theja Tulabandhula

In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. We propose a new Thompson sampling based algorithm with an exponential dependence on the number of items.

0
0
0
Abstract

In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform's goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice.

Mon Sep 21 2020
Machine Learning
Bandits Under The Influence (Extended Version)
Recommender systems should adapt to user interests as the latter evolve. In general, when the interests are not known, online algorithms that explore the recommendation space while exploiting observed preferences are preferable.
0
0
0
Fri Jul 07 2017
Machine Learning
A Tutorial on Thompson Sampling
Thompson sampling is an algorithm for online decision problems. The algorithm addresses abroad range of problems in a computationally efficient manner. This tutorial covers the algorithm and its application.
0
0
0
Sat Aug 22 2020
Machine Learning
Fatigue-aware Bandits for Dependent Click Models
recommender systems send a massive amount of content to keep users engaged. Users may experience fatigue which is contributed by exposure to irrelevant content and boredom from seeing too many similar recommendations. We propose an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account.
0
0
0
Tue Dec 01 2020
Artificial Intelligence
Non-Stationary Latent Bandits
We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms. The main strength of our approach is that it can be combined with rich offline-learned models.
0
0
0
Mon Oct 23 2017
Machine Learning
Sequential Matrix Completion
We propose a novel algorithm for sequential matrix completion in arecommender system setting. The objective of the algorithm is to provide a sequential policy for user-product pair recommendation which will yield the highest possible ratings after a finite time horizon. The algorithm uses a Gamma process factor model with two posterior-focused bandit policies.
0
0
0
Thu Jun 18 2020
Artificial Intelligence
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect
We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. Priming effect can beaturally modelled as a temporal constraint on the strategy space. We provide novel algorithms that achieves sublinear regret in time.
0
0
0