On the Suboptimality of Thompson Sampling in High Dimensions

Dec 7, 2021·
Raymond Zhang
Raymond Zhang
,
Richard Combes
· 0 min read
Abstract
In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly for some well chosen exemples that may not be that uncommon.
Type
Publication
In Neural Information Processing Systems 2021