Population Health Sciences

Nonparametric targeted Bayesian estimation of class proportions in unlabeled data.

Submitted by chz4003 on July 22, 2020 - 2:28pm

Title	Nonparametric targeted Bayesian estimation of class proportions in unlabeled data.
Publication Type	Journal Article
Year of Publication	2020
Authors	Díaz I, Savenkov O, Kamel H
Journal	Biostatistics
Date Published	2020 Jun 11
ISSN	1468-4357
Abstract	We introduce a novel Bayesian estimator for the class proportion in an unlabeled dataset, based on the targeted learning framework. The procedure requires the specification of a prior (and outputs a posterior) only for the target of inference, and yields a tightly concentrated posterior. When the scientific question can be characterized by a low-dimensional parameter functional, this focus on target prior and posterior distributions perfectly aligns with Bayesian subjectivism. We prove a Bernstein-von Mises-type result for our proposed Bayesian procedure, which guarantees that the posterior distribution converges to the distribution of an efficient, asymptotically linear estimator. In particular, the posterior is Gaussian, doubly robust, and efficient in the limit, under the only assumption that certain nuisance parameters are estimated at slower-than-parametric rates. We perform numerical studies illustrating the frequentist properties of the method. We also illustrate their use in a motivating application to estimate the proportion of embolic strokes of undetermined source arising from occult cardiac sources or large-artery atherosclerotic lesions. Though we focus on the motivating example of the proportion of cases in an unlabeled dataset, the procedure is general and can be adapted to estimate any pathwise differentiable parameter in a non-parametric model.
DOI	10.1093/biostatistics/kxaa022
Alternate Journal	Biostatistics
PubMed ID	32529244

Division:

Biostatistics

Category:

Faculty Publication