Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox
We propose the first known Thompson Sampling for combinatorial bandits whose finite-time regret does not scale exponentially with the dimension of the problem. Suprisingly, considering any subgaussian distribution as a gaussian can produce exponentialy better result.
Dec 9, 2024