Bootstrapping (statistics)
A resampling method for estimating the sampling distribution of a statistic. Instead of deriving it analytically, we simulate it: draw many samples from data and compute the statistic each time. The spread of those values approximates the true sampling distribution, giving us Confidence intervals and standard errors without closed-form formulas.
Why it works: the sample we have is the best available approximation of the population. Resampling from it mimics drawing new samples from the population.
Why with replacement: sampling without replacement from a dataset of size n always gives us the same dataset. With replacement, each resample is a different perturbation—some observations appear multiple times, others are left out—which is what generates variability in the statistic.
Statistical inference, Monte Carlo method, Markov chain Monte Carlo
Why the bootstrap variance converges
The bootstrap principle: the relationship between the bootstrap distribution and the sample mirrors the relationship between the sampling distribution and the population.
For the sample mean, bootstrap resamples are drawn i.i.d. from the empirical distribution (each with probability ). Then:
- By independence:
Since by the law of large numbers, we get .
For full distributional convergence, in probability—follows from applying the Lindeberg CLT conditionally on the data. The general result (bootstrap consistency for smooth functionals) was proved by Bickel & Freedman (1981) and Singh (1981).
Resources
“While statistics offers no magic pill for quantitative scientific investigations, the bootstrap is the best statistical pain reliever ever produced.” — Xiao Li Meng