alt text

Zijian Guo

High-dimensional Statistics

Summary (LLM read my papers; human bias-correction applied)

In my high-dimensional statistics work, I focus on a simple applied problem: in modern studies (genomics, EHR, imaging, text), we often measure far more variables than samples, so standard regression output can look confident while actually being unstable. My goal is to turn high-dimensional modeling into something you can trust for scientific conclusions—not only building predictors, but also answering inferential questions like Which effects are real? How large are they? How uncertain are we? in regimes where classical low-dimensional formulas are no longer valid.

Methodologically, my contributions develop inference-first toolkits—confidence intervals, hypothesis tests, and multiple testing procedures—that remain valid in high dimensions and in realistic extensions beyond simple linear models. This includes rigorous inference for high-dimensional GLMs (notably logistic/binary outcomes and Poisson/count models), large-scale multiple testing for high-dimensional regression, group/hierarchical inference, and semi-supervised inference for prediction-related targets; I also study robustness to complications that commonly arise in practice, such as hidden confounding (e.g., doubly debiased Lasso) and endogeneity testing with many covariates, and I connect these ideas to concrete scientific tasks such as genetic relatedness and mediation analysis. Several of these methods are accompanied by reusable implementations (e.g., the SIHR R package) to make the theory directly usable.

underline indicates supervised students ; # indicates equal contribution; * indicates alphabetical ordering ; ✉ indicates corresponding authorship.