|
Applications to Genetics and Health Data
Topic Summary
In my AI4Science / biomedical data work, I develop statistical methods that make large, heterogeneous datasets—such as genetics/proteomics and multi-center EHRs—more actionable for science: the goal is to draw conclusions that remain reliable when real-world complications arise (e.g., imperfect instruments in genetics, strong cross-hospital heterogeneity, and privacy constraints that limit data sharing).
Concretely, one line of work proposes MR-SPI, a robust Mendelian randomization approach that automatically selects valid genetic instruments and then performs post-selection inference, so causal biomarker discovery is less sensitive to invalid SNPs; it is applied to UK Biobank proteomics (912 proteins) to identify proteins associated with Alzheimer’s disease, with follow-up structural analysis via AlphaFold2. Another line of work develops SurvMaximin, a robust federated/transfer learning method for survival risk prediction that borrows strength across centers using one-time summary sharing (no patient-level data) and is designed to remain stable even when site-specific models are highly heterogeneous, improving performance especially for smaller target sites.
-
Xiong, X., Guo, Z., Zhu, H., Hong, C., Smoller, J. W., Cai, T., Liu, M✉. (2026).
Adversarial Drift-Aware Predictive Transfer: Toward Durable Clinical AI.
Technical Report
-
Yao, M., Miller, G., Vardarajan, B., Baccarelli, A.,Guo, Z.✉,and Liu, Z.✉ (2024).
Deciphering proteins in Alzheimer’s
disease: A new Mendelian randomization method integrated with AlphaFold3 for 3D structure
prediction.
Cell Genomics, 4(12), 100700.
-
Wang, X., Zhou, H., · · ·, 4CE,Avillach, P.✉, Guo, Z.✉, and Cai, T.✉ (2022)
Surv-Maximin: Robust Federated Approach to Transporting Survival Risk Prediction Models.
Journal of Biomedical Informatics, 134 (2022): 104-176.
underline indicates supervised students ; # indicates equal contribution; * indicates alphabetical ordering ; ✉ indicates corresponding authorship.
|