Comference Paper

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

We propose the first known Thompson Sampling for combinatorial bandits whose finite-time regret does not scale exponentially with the dimension of the problem. Suprisingly, considering any subgaussian distribution as a gaussian can produce exponentialy better result.

Dec 9, 2024