Speeding up MCMC by Efficient Data Subsampling

Quiroz, Matias; Villani, Mattias; Kohn, Robert; Tran, Minh-Ngoc

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Quiroz, Matias
dc.contributor.author	Villani, Mattias
dc.contributor.author	Kohn, Robert
dc.contributor.author	Tran, Minh-Ngoc
dc.date.accessioned	2017-01-19
dc.date.available	2017-01-19
dc.date.issued	2016-01-01
dc.identifier.uri	http://hdl.handle.net/2123/16205
dc.description.abstract	We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework where the likelihood function for n observations is estimated from a random subset of m observations. We introduce a general and highly efficient unbiased estimator of the log-likelihood based on control variates obtained from clustering the data. The cost of computing the log-likelihood estimator is much smaller than that of the full log-likelihood used by standard MCMC. The likelihood estimate is bias-corrected and used in two correlated pseudo-marginal algorithms to sample from a perturbed posterior, for which we derive the asymptotic error with respect to n and m, respectively. A practical estimator of the error is proposed and we show that the error is negligible even for a very small m in our applications. We demonstrate that Subsampling MCMC is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.	en_AU
dc.relation.ispartofseries	BAWP-2016-07	en_AU
dc.subject	Bayesian inference	en_AU
dc.subject	Correlated pseudo-marginal	en_AU
dc.subject	Estimated likelihood	en_AU
dc.subject	Block pseudo-marginal	en_AU
dc.subject	Big Data	en_AU
dc.subject	Survey sampling	en_AU
dc.title	Speeding up MCMC by Efficient Data Subsampling	en_AU
dc.type.pubtype	Pre-print	en_AU