On the Suboptimality of Thompson Sampling in High Dimensions
Abstract
In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly for some well chosen exemples that may not be that uncommon.
Type
Publication
In Neural Information Processing Systems 2021