Published on Wed Sep 23 2020

Cache Replacement as a MAB with Delayed Feedback and Decaying Costs

Farzana Beente Yusuf, Vitalii Stebliankin, Giuseppe Vietri, Giri Narasimhan

A new variant of the well-known multi-armed bandit (MAB) is proposed. Each arm represents a distinct cache replacement policy, which advises on the page to evict. Feedback on the eviction comes in the form of a "miss" at an indeterminate time after the action is taken.

0
0
0
Abstract

Inspired by the cache replacement problem, we propose and solve a new variant of the well-known multi-armed bandit (MAB), thus providing a solution for improving existing state-of-the-art cache management methods. Each arm (or expert) represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed. Feedback on the eviction comes in the form of a "miss", but at an indeterminate time after the action is taken, and the cost of the eviction is set to be inversely proportional to the response time. The feedback is ignored if it comes after a threshold value for the delay, which we set to be equal to the size of the page eviction history. Thus, for delays beyond the threshold, its cost is assumed to be zero. Consequently, we call this problem with delayed feedback and decaying costs. We introduce an adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to the problem. We derive an optimal learning rate for EXP4-DFDC that defines the balance between exploration and exploitation and proves theoretically that the expected regret of our algorithm is a vanishing quantity as a function of time. As an application, we show that LeCaR, a recent top-performing machine learning algorithm for cache replacement, can be enhanced with adaptive learning using our formulations. We present an improved adaptive version of LeCaR, called OLeCaR, with the learning rate set as determined by the theoretical derivation presented here to minimize regret for EXP4-DFDC. It then follows that LeCaR and OLeCaR are theoretically guaranteed to have vanishing regret over time.

Mon Sep 30 2019
Machine Learning
RLCache: Automated Cache Management Using Reinforcement Learning
Cache managers directly impact the overall performance of computer systems. An optimal cache manager will avoid unnecessary operations and minimise the size of storage needed to achieve a high hit-rate. This project is the first to model cache manager decisions as amulti-task control problem.
0
0
0
Tue May 19 2020
Machine Learning
Reinforcement Learning for Caching with Space-Time Popularity Dynamics
Caching is visioned to play a critical role in next-generation networks. A cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
0
0
0
Mon Jun 28 2021
Machine Learning
Robust Learning-Augmented Caching: An Experimental Study
Caching is crucial for the performance of modern-day computing systems. A key optimization problem arising in caching cannot be optimally solved without knowing the future. New field of learning-augmented algorithms proposes solutions that leverage online caching algorithms.
2
0
0
Mon Jan 18 2021
Machine Learning
Online Caching with Optimal Switching Regret
A cache of limited storage capacity can hold files at a time. A user requests an arbitrary file from the catalog at each time slot. In the case of aCache-hit, the policy receives a unit reward and zero rewards otherwise.
0
0
0
Mon Jun 28 2021
Machine Learning
Dynamic Planning and Learning under Recovering Rewards
We introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most out of different arms. (ii) the expected reward of an arm immediately drops after it is
2
0
0
Thu Feb 27 2020
Machine Learning
Online Learning for Active Cache Synchronization
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated. This paper introduces synchronization bandits, where all arms generate costs at all times.
0
0
0