Lattimore T., Szepesvári C. Bandit Algorithms

pdf file
size 13,01 MB

added by Masherov 08/31/2020 18:04
info modified 05/20/2023 11:24

Lattimore T., Szepesvári C. Bandit Algorithms

Cambridge: Cambridge University Press, 2020. — 537 p.

Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics. Linear bandits receive special attention as one of the most useful models in applications, while other chapters are dedicated to combinatorial bandits, ranking, non-stationary problems, Thompson sampling and pure exploration. The book ends with a peek into the world beyond bandits with an introduction to partial monitoring and learning in Markov decision processes.

Bandits, Probability and Concentration.

The Language ofBandits.
Applications.

Foundations of Probability.
Probability Spaces and Random Elements.
σ-Algebras and Knowledge.
Conditional Probabilities.
Independence.
Integration and Expectation.
Conditional Expectation.

Exercises.
Stochastic Processes and Markov Chains.
Stochastic Processes.
Markov Chains.
Martingales and Stopping Times.

Exercises.
Stochastic Bandits.
Core Assumptions.
The Learning Objective.
Knowledge and Environment Classes.
The Regret.
Decomposing theRegret.
The Canonical Bandit Model.
The Canonical Bandit Model forUncountable Action Sets.

Bibliographical Remarks.
Exercises.
Concentration of Measure.
Tail Probabilities.
The Inequalities of Markov and Chebyshev.
The Cramér-Chernoff Method and Subgaussian Random Variables.

Bibliographical Remarks.
Exercises.
Stochastic Bandits with Finitely Many Arms.
The Explore-Then-Commit Algorithm.
Algorithm and Regret Analysis.

Bibliographical Remarks.
Exercises.
The Upper Confidence Bound Algorithm.
The Optimism Principle.

Bibliographical Remarks.
Exercises.
The Upper Confidence Bound Algorithm: Asymptotic Optimality.
Asymptotically Optimal UCB.

Exercises.
The Upper Confidence Bound Algorithm: Minimax Optimality.
The MOSS Algorithm.
Two Problems.

Exercises.
The Upper Confidence Bound Algorithm: Bernoulli Noise.
Concentration forSums of Bernoulli Random Variables.
The KL-UCB Algorithm.

Exercises.
Adversarial Bandits with Finitely Many Arms.
The Exp Algorithm.
Adversarial Bandit Environments.
Importance-Weighted Estimators.
The Exp Algorithm.
Regret Analysis.

Exercises.
The Exp-IX Algorithm.
The Exp-IX Algorithm.
Regret Analysis.

Exercises.
Lower Bounds for Bandits with Finitely Many Arms.
Lower Bounds: Basic Ideas.
MainIdeas Underlying Minimax Lower Bounds.

Exercises.
Foundations of Information Theory.
Entropy and Optimal Coding.
Relative Entropy.

Exercises.
Minimax Lower Bounds.
Relative Entropy Between Bandits.
Minimax Lower Bounds.

Exercises.
Instance-Dependent Lower Bounds.
Asymptotic Bounds.
Finite-Time Bounds.

Exercises.
High-Probability Lower Bounds.
Stochastic Bandits.
Adversarial Bandits.

Exercises.
Contextual and Linear Bandits.
Contextual Bandits.
Contextual Bandits: One Bandit per Context.
Bandits withExpert Advice.
Exp.
Regret Analysis.

Exercises.
Stochastic Linear Bandits.
Stochastic Contextual Bandits.
Stochastic Linear Bandits.
Regret Analysis.

Exercises.
Confidence Bounds for Least Squares Estimators.
Martingales and the Method of Mixtures.

Exercises.
Optimal Design for Least Squares Estimators.
The Kiefer – Wolfowitz Theorem.

Exercises.
Stochastic Linear Bandits with Finitely Many Arms.

Exercises.
Stochastic Linear Bandits with Sparsity.
Sparse Linear Stochastic Bandits.
Elimination on theHypercube.
Online toConfidence Set Conversion.
Sparse Online Linear Prediction.

Bibliographical Remarks.
Exercises.
Minimax Lower Bounds for Stochastic Linear Bandits.
Hypercube.
Unit Ball.
Sparse Parameter Vectors.
MisspecifiedModels.

Exercises.
Asymptotic Lower Bounds for Stochastic Linear Bandits.
AnAsymptotic Lower Bound forFixed Action Sets.
Clouds Looming forOptimism.

Exercises.
Adversarial Linear Bandits.
Foundations of Convex Analysis.
Convex Sets and Functions.
Jensen’s Inequality.
Bregman Divergence.
Legendre Functions.
Optimization.
Projections.

Exercises.
Exp for Adversarial Linear Bandits.
Exponential Weights for Linear Bandits.
Regret Analysis.
Continuous Exponential Weights.

Exercises.
Follow-the-regularised-Leader and Mirror Descent.
Online Linear Optimization.
Regret Analysis.
Application toLinear Bandits.
Linear Bandits onthe Unit Ball.

Exercises.
The Relation between Adversarial and Stochastic Linear Bandits.
Unified View.
Reducing Stochastic Linear Bandits toAdversarial Linear Bandits.
Stochastic Linear Bandits withParameter Noise.
Contextual Linear Bandits.

Exercises.
Other Topics.
Combinatorial Bandits.
Notation and Assumptions.
Applications.
Bandit Feedback.
Semi-bandit Feedback and MirrorDescent.
Follow-the-Perturbed-Leader.

Exercises.
Non-stationary Bandits.
Adversarial Bandits.
Stochastic Bandits.

Exercises.
Ranking.
ClickModels.
Policy.
Regret Analysis.

Exercises.
Pure Exploration.
Simple Regret.
Best-Arm Identification witha Fixed Confidence.
Best-Arm Identification witha Budget.

Bibliographical Remarks.
Exercises.
Foundations of Bayesian Learning.
Statistical Decision Theory and Bayesian Learning.
Bayesian Learning and thePosterior Distribution.
Conjugate Pairs, Conjugate Priorsand theExponential Family.
The Bayesian Bandit Environment.
Posterior DistributionsinBandits.
Bayesian Regret.

Exercises.
Bayesian Bandits.
Bayesian Optimal Regret for k-ArmedStochastic Bandits.
Optimal Stopping.
One-armed bandits.
GittinsIndex.
Computing the GittinsIndex.

Bibliographical Remarks.
Exercises.
Thompson Sampling.
Finite-Armed Bandits.
Frequentist Analysis.
Linear Bandits.
Information Theoretic Analysis.

Exercises.
Beyond Bandits.
Partial Monitoring.
FiniteAdversarial Partial Monitoring Problems.
The Structure of Partial Monitoring.
Classificationof FiniteAdversarial Partial Monitoring.
Lower Bounds.
Policy and Upper Bounds.
Proof of Theorem.
Proof of Theorem.
Proof of the ClassificationTheorem.

Bibliographical Remarks.
Exercises.
Markov Decision Processes.
Problem Set-Up.
Optimal Policies and theBellman OptimalityEquation.
Finding anOptimal Policy.
Learning inMarkov Decision Processes.
Upper Confidence Bounds for Reinforcement Learning.
Proof of Upper Bound.
Proof of Lower Bound.

Bibliographical Remarks.
Exercises.