Grant

Flexible Modeling for High-Dimensional Complex Data: Theory, Methodology, and Computation

Abstract

In high dimensional data analysis, the relationships among predictors can be highly nonlinear and non-additive, and taking into account such complex structures may significantly improve model prediction power and provide crucial insight about the underlying data generation mechanism. The goal of this project is to develop and study new statistical and data mining methodologies for detecting nonlinear and non-additive patterns in high dimensional sparse models. When the data dimension is ultra-high, interaction selection is extremely challenging, both numerically and theoretically, due to curse of dimensionality. There are very limited tools available in practice and theory is scant. In this project, the investigators give a comprehensive treatment to the problem of high-dimensional interaction selection. They propose and study novel selection and modeling techniques for a variety of regression and classification models. Fast and robust large-scale computational algorithms are derived. In addition, the investigators are committed to establishing high dimensional theory for interaction selection and providing a solid foundation for the new methods. The investigators also propose and study a unified theory and computation framework to identify nonlinear effects for a broad class of nonparametric regression models. Special effort is spent on addressing computational issues such as multiple parameter tuning, regularization solution path/surface algorithms, and development of user friendly statistical software packages. Big and high dimensional data offer us fascinating and unprecedented opportunities to gain extraordinary insight from data. On the other hand, the scale and volume of data create tremendous challenges for standard analysis tools to extract useful information. The goal of this project is to develop innovative statistical and data mining methods, solid mathematical theory, and powerful computational tools and software to capture hidden and possibly complex patterns when the data dimension is high. One challenging problem to be tackled in this project is high dimensional interaction selection. In genome-wide association studies (GWAS), there is growing evidence that gene-gene and gene-environment interactions can provide key insight about complex biological pathways that underpin human diseases. However, there are very few effective, well-grounded, and computationally attractive tools available in practice to identify interactions for high dimensional data. The investigators try to fill this gap by conducting thorough investigation on the problem. The results from this project research can significantly advance theory and as well as contribute new statistical tools for practical use. The proposed methods have a wide range of scientific applications such as biology, biomedicine, and environmental studies. This project integrates research, education, and interdisciplinary collaboration through developing new graduate and undergraduate courses and involving students in the research activities.

People

Hao Zhang
Principal Investigator (PI)
Mathematics﹒Professor
Ning Hao
Co-Investigator (COI)
Mathematics﹒Associate Professor

Grant

Grant

Flexible Modeling for High-Dimensional Complex Data: Theory, Methodology, and Computation

Sponsored by National Science Foundation

Related Topics

Abstract

People

Hao Zhang

Ning Hao