High-dimensional Statistics

Summary (LLM read my papers; human bias-correction applied)

In my high-dimensional statistics work, I focus on a simple applied problem: in modern studies (genomics, EHR, imaging, text), we often measure far more variables than samples, so standard regression output can look confident while actually being unstable. My goal is to turn high-dimensional modeling into something you can trust for scientific conclusions—not only building predictors, but also answering inferential questions like Which effects are real? How large are they? How uncertain are we? in regimes where classical low-dimensional formulas are no longer valid.

Methodologically, my contributions develop inference-first toolkits—confidence intervals, hypothesis tests, and multiple testing procedures—that remain valid in high dimensions and in realistic extensions beyond simple linear models. This includes rigorous inference for high-dimensional GLMs (notably logistic/binary outcomes and Poisson/count models), large-scale multiple testing for high-dimensional regression, group/hierarchical inference, and semi-supervised inference for prediction-related targets; I also study robustness to complications that commonly arise in practice, such as hidden confounding (e.g., doubly debiased Lasso) and endogeneity testing with many covariates, and I connect these ideas to concrete scientific tasks such as genetic relatedness and mediation analysis. Several of these methods are accompanied by reusable implementations (e.g., the SIHR R package) to make the theory directly usable.

Rakshit, P., and Guo, Z.(2024).
Statistical Inference in High-dimensional Poisson Regression with Applications to Mediation Analysis
Technical Report

*Fan, Q., Guo, Z., Mei, Z.,and Zhang, C. (2023).
Uniform Inference for Nonlinear Endogenous Treatment Effects with High-Dimensional Covariates.
Technical report

*Guo, Z., Yuan W. and Zhang, C. (2019).
Local Inference in Additive Models with Decorrelated Local Linear Estimator.
Technical Report [Codes]

*Fan, Q., Guo, Z.,and Mei, Z. (2024+).
A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional Covariates
Journal of Business & Economic Statistics, to appear.

Scheidegger, C., Guo, Z., and Bühlmann, P. (2024+).
Spectral deconfounding for high-dimensional sparse additive models
ACM/IMS Journal of Data Science, to appear.

Lin, Y., Guo, Z., Sun, B., and Lin, Z.(2024+).
Testing High-Dimensional Mediation Effect with Arbitrary Exposure-Mediator Coefficients
Test, to appear.

Ma, R., Guo, Z.,Cai, T. T., and Li, H.(2024).
Statistical Inference of Genetic Relatedness using High-Dimensional Logistic Regression.
Statistica Sinica, 34 (2024): 1023-1043.

Rakshit, P.,Wang, Z.,Cai, T. T., and Guo, Z. (2024).
SIHR: An R Package for Statistical Inference in High-dimensional Linear and Logistic Regression Models..
R Journal, to appear.

*Cai, T., Guo, Z. and Xia, Y. (2023).
Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models
Test (with discussion), 32(4), 1135-1171. [Codes]

Hou, J., Guo, Z., and Cai, T. (2023).
Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.
Journal of Machine Learning Research, 24(265), 1-58. [Codes]

* Cai, T. T., Guo, Z., and Ma, R. (2023).
Statistical Inference for High-Dimensional Generalized Linear Models with Binary Outcomes.
Journal of the American Statistical Association, 118 (542), 1319-1332. [Codes]

Guo, Z. , Cevid, D., and Bühlmann, P. (2022).
Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding.
Annals of Statistics, 50 (3), 1320 - 1347. [Codes]

Guo, Z. and Zhang, C. (2022).
Extreme Nonlinear Correlation for Multiple Random Variables and Stochastic Processes with Applications to Additive Models.
Stochastic Processes and Their Applications, 150, 1037-1058

* Cai, T, Cai, T. T. and Guo, Z. (2021).
Optimal Statistical Inference for Individualized Treatment Effects in High-dimensional Models.
Journal of the Royal Statistical Society: Series B, 2021, 83(4): 669-719. [Codes]

Guo, Z., Rakshit, P., Herman, D., and Chen, J. (2021).
Inference for Case Probability in High-dimensional Logistic Regression.
Journal of Machine Learning Research, 22(254), 1-54 [Codes]

Guo, Z., Renaux, C., Bühlmann, P., and Cai, T. T. (2021).
Group Inference in High Dimensions with Applications to Hierarchical Testing.
Electronic Journal of Statistics, 15(2), 6633-6676. [Codes]

* Cai, T. T. and Guo, Z. (2020).
Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications.
Journal of the Royal Statistical Society: Series B, 82(2), 391-419. [Codes]

Guo, Z., Wang, W., Cai, T. T. and Li, H. (2019).
Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models.
Journal of the American Statistical Association, 114(525), 358-369. [Codes]

Guo, Z., Kang, H., Cai, T. T. and Small, D. S. (2018).
Testing Endogeneity with High Dimensional Covariates.
The Journal of Econometrics, 207(1), 175-187. [Codes]

Guo, Z., Kang, H., Cai, T. T. and Small, D. S. (2018).
Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting.
Journal of the Royal Statistical Society: Series B, 80(4), 793-815. [Codes]

underline indicates supervised students ; # indicates equal contribution; * indicates alphabetical ordering ; ✉ indicates corresponding authorship.