Bayesian multiple testing with spike and slab priors

Bayesian posterior distributions that allow for variable selection are often used in practice to address multiple testing questions. Besides their empirical success, they have been advocated among others by Bradley Efron for use in combination with empirical Bayes estimators of unknown prior parameters.

We consider three popular multiple testing procedures based on spike and slab priors. The first simply selects coordinates based on low posterior probabilities of coming from the null distribution, so-called ell-values (also known as ‘local FDR’ values). The second procedure is based on cumulative ell-values and the third on thresholding of so-called q-values (Storey 2003). While simple decision-theoretic arguments show that these procedures have optimality properties in the Bayesian setting assuming the prior is correct, it is natural to wonder whether their excellent behaviour in practice can be backed-up by theoretical guarantees if the true parameter is a fixed sparse (but otherwise arbitrary) vector.

In a sparse normal means setting, we demonstrate that the procedures behave optimally in a number of ways, if the spike-and-slab weight is calibrated using marginal maximum likelihood in an Empirical Bayes fashion. On the one hand, we prove that the frequentist FDR (False Discovery Rate) of these procedures is uniformly controlled: it goes to zero slowly for the ell-value procedure, and stays close to a user-specified nominal level for the q-value procedure. On the other hand, we study the power through the FNR (False Negative Rate). We investigate multiple testing minimax rates and prove that sharp adaptive minimaxity for the multiple testing risk is achieved by Empirical Bayes-calibrated ell-value procedures.

This talk is based on joint works with Etienne Roquain (Sorbonne) and Kweku Abraham (Paris-Saclay).