Supervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction.

TitleSupervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction.
Publication TypeJournal Article
Year of Publication2022
AuthorsWanyan T, Lin M, Klang E, Menon KM, Gulamali FF, Azad A, Zhang Y, Ding Y, Wang Z, Wang F, Glicksberg B, Peng Y
JournalACM BCB
Volume2022
Date Published2022 Aug
Abstract

Clinical EHR data is naturally heterogeneous, where it contains abundant sub-phenotype. Such diversity creates challenges for outcome prediction using a machine learning model since it leads to high intra-class variance. To address this issue, we propose a supervised pre-training model with a unique embedded k-nearest-neighbor positive sampling strategy. We demonstrate the enhanced performance value of this framework theoretically and show that it yields highly competitive experimental results in predicting patient mortality in real-world COVID-19 EHR data with a total of over 7,000 patients admitted to a large, urban health system. Our method achieves a better AUROC prediction score of 0.872, which outperforms the alternative pre-training models and traditional machine learning methods. Additionally, our method performs much better when the training data size is small (345 training instances).

DOI10.1145/3535508.3545541
Alternate JournalACM BCB
PubMed ID35960866
PubMed Central IDPMC9365529
Grant ListR00 LM013001 / LM / NLM NIH HHS / United States
Division: 
Institute of Artificial Intelligence for Digital Health
Category: 
Faculty Publication