|Title:||Speeding up MCMC by Efficient Data Subsampling|
|Abstract:||We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework where the likelihood function for n observations is estimated from a random subset of m observations. We introduce a general and highly efficient unbiased estimator of the log-likelihood based on control variates obtained from clustering the data. The cost of computing the log-likelihood estimator is much smaller than that of the full log-likelihood used by standard MCMC. The likelihood estimate is bias-corrected and used in two correlated pseudo-marginal algorithms to sample from a perturbed posterior, for which we derive the asymptotic error with respect to n and m, respectively. A practical estimator of the error is proposed and we show that the error is negligible even for a very small m in our applications. We demonstrate that Subsampling MCMC is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.|
|Type of Publication:||Pre-print|
|Appears in Collections:||Working Papers - Business Analytics|
|BAWP-2017-01.pdf||751.48 kB||Adobe PDF|
Items in Sydney eScholarship Repository are protected by copyright, with all rights reserved, unless otherwise indicated.