Statistics Seminars Spring 2017

Friday, April 7, 2017 at 11:00am-12:00noon, Room Fretwell 315

Title: Functional and very high dimension reduction
By Yanyuan Ma, Ph.D, Professor
Department of Statistics
Penn State University
Hosted by Prof. Yanqing Sun, UNC Charlotte

The talk has two components. In the first component, to study the relation between a univariate response and multiple functional covariates, we propose a functional single index model that is semiparametric. The parametric part of the model integrates the linear regression modeling for functional data and the sufficient dimension reduction structure. The nonparametric part of the model further allows the response-index dependence or the link function to be unspecified. We use B-splines to approximate the coefficient function in the functional linear regression model part and reduce the problem to a familiar dimension folding model. We develop a new method to handle the subsequent dimension folding model by using kernel regression in combination with semiparametric treatment. The new method does not impose any special requirement on the inner product between the covariate function and the B-spline bases, and allows efficient estimation of both the index vector and the B-spline coefficients. The estimation method is general and applicable to both continuous and discrete response variables. We further derive asymptotic properties of the class of methods for both the index vector and the coefficient function. We establish the semiparametric optimality, which has not been done before in a semiparametric model where both kernel and B-spline estimation are involved. In the second component, we study large genetic data available easily due to technology advance. However, in comparison with the data collection procedure, statistical analysis is still much cheaper. Thus, secondary analysis of SNPs data|re-analyze existing data in an effort to extract more information, is attractive and cost effective. We study the relation between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation (FADRE). To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance. We further quantify the asymptotic performance of the parameter estimation and perform inference. The new results enables us to identify statistically significant SNPs concerning gene-SNPs relation in lung tissues for the first time from GTEx data.

Statistics Seminars Fall 2016

Friday, October 28, 2016 at 11:00am-12:00noon, Room Fretwell 315

Title: Dimensional Analysis and Its Applications in Statistics
By Dennis K. J. Lin, Ph.D, University Distinguished Professor of Statistics
Department of Statistics
The Pennsylvania State University
Hosted by Prof. Jiancheng Jiang, UNC Charlotte

Dimensional Analysis (DA) is a fundamental method in the engineering and physical sciences for analytically reducing the number of experimental variables prior to the experimentation.  The principle use of dimensional analysis is to reduce from a study of the dimensions of the variables on the form of any possible relationship between those variables.  The method is of great generality.  In this talk, an overview/introduction of DA will be first given.  A basic guideline for applying DA will be proposed, using examples for illustration.  Some initial ideas on using DA for Data Analysis and Data Collection will be discussed.  Future research issues will be proposed.

Friday, November 4, 2016 at 11:00am-12:00noon, Room Fretwell 315

Title: Recurrent Event Data Analysis With Intermittently Observed Time-Varying Covariates
By Chiung-Yu Huang, Ph.D, Associate Professor
Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center
Johns Hopkins University
Hosted by Prof. Yanqing Sun, UNC Charlotte

Although recurrent event data analysis is a rapidly evolving area of research, rigorous studies on modeling and estimation of the effects of time-varying covariates on the risk of recurrent events have been lacking. Existing methods for analyzing recurrent event data usually require that the covariate processes are observed throughout the entire follow-up period.  However, covariates are often observed periodically rather than continuously. We propose a novel semiparametric estimator for the regression parameters in the popular proportional rate model.  The proposed estimator is based on an estimated score function where we kernel smooth the mean covariate process. We show that the proposed semiparametric estimator is asymptotically unbiased, normally distributed and derive the asymptotic variance. Simulation studies are conducted to compare the performance of the proposed estimator and the simple methods carrying forward the last covariates. The different methods are applied to an observational study designed to assess the effect of Group A streptococcus (GAS) on pharyngitis among school children in India. 

Friday, November 18, 2016 at 11:00am-12:00noon, Room Fretwell 315

Title: Gene selection with genetical genomics data incorporating network structures
By Yuehua Cui, Ph.D, Professor
Department of Statistics and Probability
Michigan State University
Hosted by Prof. Shaoyu Li, UNC Charlotte

Genetical genomics data provide rich information about gene regulation and offer promising resources for integrative analysis of omics data. Lin et al. (2015) recently proposed an instrumental variables (IV) regression framework to select important genes with high-dimensional genetical genomics data. The IV regression solves the problem of endogeneity issue caused by potential correlation of gene expressions and the error terms, hence improves the performance of gene selection. As genes function in networks to achieve joint task, incorporating network or graph structures in a regression model can further improve gene selection performance. In this work, we propose a graph-constrained penalized IV regression framework to solve the endogeneity issue and to improve the selection performance by considering gene network structures. We propose a two-step estimation procedure by adopting a network-constrained regularization method to obtain better variable selection and estimation, and further establish the selection consistency. Simulation and real data analysis are conduced to show the utility of the method. (This is a joint work with Bin Gao and Xu Liu.)

Statistics Seminars Fall 2015

Friday, October 2, 2015 at 11:00am-12:00noon, Room Fretwell 116

Title: Analysis of High Dimensional Longitudinal Data with Measurement Error and Missing Observations
By Grace Yi, Ph.D, Professor
Department of Statistics and Actuarial Science
University of Waterloo, Canada
Hosted by Prof. Yanqing Sun, UNC Charlotte

Longitudinal studies have proven to be useful in studying changes of response over time, and have been widely conducted in practice. It is common that longitudinal studies collect a large number of covariates, some of which are unimportant in explaining the response. Including such covariates in modelling and inferential procedures would greatly degrade the quality of the results. Moreover, longitudinal data analysis is challenged by the presence of measurement error and missing observations. In this talk, I will discuss the issues induced from these features, and describe simultaneous variable selection and estimation procedures that handle high dimensional longitudinal data with missingness and measurement error.

Friday, November 6, 2015 at 11:00am-12:00noon, Room Fretwell 116

Title: Variable selection via measurement error model selection likelihoods
By Yichao Wu, Ph.D, Associate Professor
Department of Statistics
North Carolina State University
Hosted by Shaoyu Li, UNC Charlotte

The measurement error model selection likelihood was proposed in Stefanski, Wu and White (2014) to conduct variable selection. It provides a new perspective on variable selection. The first part of my talk will be a review of the measurement error model selection likelihoods. In the second part, I will present an extension to nonparametric variable selection in kernel regression.

Friday, November 20, 2015 at 11:00am-12:00noon, Room Fretwell 116

Title: Nonparametric goodness-of-fit tests for uniform stochastic ordering
By Dewei Wang, Ph.D, Assistant Professor
Department of Statistics
University of South Carolina
Hosted by Yang Li, UNC Charlotte

In this talk, I will introduce a new nonparametric procedure for testing against uniform stochastic ordering in a two-population setting. Uniform stochastic ordering is stronger than ordinary stochastic ordering but weaker than likelihood ratio ordering. Uniform stochastic ordering is satisfied when the ordinal dominance curve associated with the two distributions is star-shaped. To develop a goodness-of-fit test for this property, we construct test statistics by examining the discrepancy between the empirical ordinal dominance curve and its the least star-shaped majorant. We derive the limiting distribution of these statistics when uniform stochastic ordering is satisfied or not, and further we establish the least favorable distribution that can be used to determine the critical values. We illustrate the performance of our testing procedure through simulation and by applying it to a caffeine study involving premature infants conducted by Palmetto Health Richland in Columbia, SC.

Statistics Seminars Spring 2015

Friday, January 23, 2015 at 11:00am-12:00noon, Room Friday 132

Title: A Mutual Information Estimator With Exponentially Decaying Bias
By Lukun Zheng
Ph.D candidate, Department of Mathematics and Statistics, UNC Charlotte

A non-parametric estimator of mutual information is proposed and is shown to have asymptotic normality and efficiency, and a bias decaying exponentially in sample size. The asymptotic normality and the rapidly decaying bias together offer a viable inferential tool for assessing mutual information between two random elements on finite alphabets where the maximum likelihood estimator of mutual information greatly inflates the probability of type I error. The proposed estimator is illustrated by three examples in which the association between a pair of genes is assessed based on their expression levels. Several results of simulation study are also provided.

Friday, March 20, 2015 at 11:00am-12:00noon, Room Friday 132

Title: Sparse Regression Incorporating Graphical Structure Among Predictors
By Yufeng Liu, Ph.D, Professor of Statistics and Professor of Biostatistics
Deptartment of Statistics and Operations Research, University of North Carolina at Chapel Hill
Hosted by Prof. Jiancheng Jiang, UNC Charlotte

Abstract: With the abundance of high dimensional data in various disciplines, sparse regularized techniques are very popular these days. In this talk, we use the structure information among predictors to improve sparse regression models. Typically, such structure information can be modeled by the connectivity of an undirected graph. Most existing methods use this graph edge-by-edge to encourage the regression coefficients of corresponding connected predictors to be similar. However, such methods may require expensive computation when the predictor graph has many edges. Furthermore, they do not directly utilize the neighborhood information. In this work, we incorporate the graph information node-by-node instead of edge-by-edge. Our proposed method is quite general and it includes adaptive Lasso, group Lasso and ridge regression as special cases. Both theoretical study and numerical study demonstrate the effectiveness of the proposed method for simultaneous estimation, prediction and model selection. Applications to Alzheimer's disease data and cancer data will be discussed as well.

Friday, March 27, 2015 at 11:00am-12:00noon, Room Friday 132

Title: Robust Hybrid Learning For Estimating Personalized Treatment Regimes
By Donglin Zheng, Ph.D, Professor
Deptartment of Biostatistics, University of North Carolina at Chapel Hill
Hosted by Prof. Yanqing Sun, UNC Charlotte

Abstract: Dynamic treatment regimes (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. The complexity, patient heterogeneity and chronicity of many dis- eases and disorders calls for learning optimal DTRs which best dynamically tailor treatment to each individual's response over time. Proliferation of personalized data (e.g., genetic and imaging data) provides opportunities for deep tailoring as well as new challenges for statistical methodology. In this work, we propose a robust and hybrid learning method, namely Augmented Multistage Outcome-Weighted Learning (AMOL), to identify optimal DTRs from the Sequential Multiple Assignment Randomization Trials (SMARTs). For multiple-stage SMART studies, we develop a sequentially backward learning method to infer DTRs, making use of the robustness of single-stage outcome weighted learning and the imputation ability of regression model-based Q- learning at each stage. The proposed AMOL remains valid even if the imputation model assumed in the Q-learning is misspecified. We establish theoretical properties of AMOL, including double robustness and efficiency of the imputation step, as well as consistency of estimated rules and rates of convergence to the optimal value function. The comparative advantage of AMOL over existing methods is demonstrated in extensive simulation studies and applications to two SMART data sets: a two-stage trial for attention deficit and hyperactive disorder (ADHD) and the STAR*D trial for major depressive disorder (MDD).

Friday, April 10, 2015 at 11:00am-12:00noon, Room Friday 132

Title: Nonhomogeneous Poisson models for panel count data and interval-censored failure time data
By Lianming Wang, Ph.D, Professor
Deptartment of Statistics, University of South Carolina
Hosted by Prof. Yang Li, UNC Charlotte

Abstract: In many epidemiological and medical studies, subjects are examined at regular or irregular follow-up visits. Panel count data arise when the response of interest is the count of some repeated events between consecutive examination times, while interval-censored data arise when the response of interest is the time to some particular event and only the status of the event is known at each examination time. Poisson process has been popular to model the panel count data in the literature. In this talk, we propose a gamma-frailty nonhomogeneous Poisson process model for analyzing panel count data to account for the within-subject correlation and develop an easy estimation method using EM algorithm. We also propose a computationally efficient method for analyzing general interval-censored data under the PH model using an EM algorithm. We developed a novel data augmentation by introducing a latent nonhomogeneous Poisson process to expand the observed likelihood. Both approaches have shown excellent performance in terms of estimation accuracy and computational advantages, such as being robust to initial values, converging fast, and providing variance estimates in closed-form. A joint modeling of panel count responses and the interval-censored failure time of a terminal event is discussed as a generalization of the proposed approaches.

Friday, May 1, 2015 at 11:00am-12:00noon, Room Friday 132

Title: Spatial Temporal Modeling of Gene Expression Dynamics During Human Brain Development
By Hongyu Zhao, Ira V. Hiscock Professor of Public Health (Biostatistics) and Professor of Genetics and of Statistics
School of Public Health, Yale University
Hosted by Prof. Shaoyu Li, UNC Charlotte

Abstract: Human neurodevelopment is a highly regulated biological process, and recent technological advances allow scientists to study the dynamic changes of neurodevelopment at the molecular level through the analysis of gene expression data from human brains. In this talk, we focus on the analysis of data sampled from 16 brain regions in 15 time periods of neurodevelopment. We will introduce a two-step statistical inferential procedure to identify expressed and unexpressed genes and to detect differentially expressed genes between adjacent time periods. Markov Random Field (MRF) models are used to efficiently utilize the information embedded in brain region similarity and temporal dependency in our approach. We develop and implement a Monte Carlo expectation-maximization (MCEM) algorithm to estimate the model parameters. Simulation studies suggest that our approach achieves lower misclassification error and potential gain in power compared with models not incorporating spatial similarity and temporal dependency. We will also describe our methods to infer dynamic co-expression networks from these data. This is joint work with Zhixiang Lin, Stephan Sanders, Mingfeng Li, Nenad Sestan, and Matthew State.

Statistics Seminars Fall 2014

Friday, September 26, 2014 at 11:00am-12:00noon, Room Fretwell 120

Title: Estimation of stratified mark-specific proportional hazards models with missing marks
By Prof. Yanqing Sun, UNC Charlotte

An objective of randomized placebo-controlled preventive HIV vaccine efficacy trials is to assess the relationship between the vaccine effect to prevent infection and the genetic distance of the exposing HIV to the HIV strain represented in the vaccine construct. Motivated by this objective, recently a mark-specific proportional hazards model with a continuum of competing risks has been studied, where the genetic distance of the transmitting strain is the continuous `mark' defined and observable only in failures. A high percentage of genetic marks of interest may be missing for a variety of reasons, predominantly due to rapid evolution of HIV sequences after transmission before a blood sample is drawn from which HIV sequences are measured. This research investigates the stratified mark-specific proportional hazards model with missing marks where the baseline functions may vary with strata. We develop two consistent estimation approaches, the first based on the inverse probability weighted complete-case (IPW) technique, and the second based on augmenting the IPW estimator by incorporating auxiliary information predictive of the mark. We investigate the asymptotic properties and finite-sample performance of the two estimators, and show that the augmented IPW estimator, which satisfies a double robustness property, is more efficient.

Friday, October 17, 2014 at 11:00am-12:00noon, Room Fretwell 120

Title: Consistent Cross-Validation for Tuning Parameter Selection in High-Dimensional Variable Selection
By Yang Feng, Ph.D, Assistant Professor
Deptartment of Statistics, Columbia University
Hosted by Prof. Jiancheng Jiang, UNC Charlotte

Asymptotic behavior of the tuning parameter selection in the standard cross-validation methods is investigated for the high-dimensional variable selection problem. It is shown that the shrinkage effect of the Lasso penalty is not always the true reason for the over-selection phenomenon in the cross-validation based tuning parameter selection. After identifying the potential problems with the standard cross-validation methods, we propose a new procedure, Consistent Cross-Validation (CCV), for selecting the optimal tuning parameter. CCV is shown to enjoy the tuning parameter selection consistency property under certain technical conditions. Extensive simulations and real data analysis support the theoretical results and demonstrate that CCV also works well in terms of prediction.

Friday, November 7, 2014 at 11:00am-12:00noon, Room Fretwell 120

Title: Random Field Modelling of Genetic Association for Sequencing Data
By Ming Li, Ph.D, Assistant Professor
Deptartment of Pediatrics, College of Medicine, University of Arkansas for Medical Science
Hosted by Prof. Shaoyu Li, UNC Charlotte

With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this talk, we will introduce a generalized genetic random filed (GGRF) method for association analyses of sequencing data in case-control studies. We will then further extend GGRF method to a family-based GGRF (FB-GGRF) method for family-based association studies. Both GGRF and FB-GGRF methods are compared with other existing methods through simulation studies and real data applications for investigating the genetic etiology of complex diseases/traits.

Friday, November 14, 2014 at 11:00am-12:00noon, Room Fretwell 120

Title: Simultaneous Modeling of Propensity for Disease, Rater Bias and Rater Diagnostic Skill in Dichotomous Subjective Rating Experiments
By Xiaoyan Lin, Ph.D, Assistant Professor
Deptartment of Statistics, University of South Carolina
Hosted by Prof. Yang Li, UNC Charlotte

Many disease diagnoses involve subjective judgments. For example, through the inspection of a mammogram, MRI, radiograph, ultrasound image, etc., the clinician himself becomes part of the measuring instrument. Variability among raters examining the same item injects variability into the entire diagnostic process and thus adversely affect the utility of the diagnostic process itself. To reduce diagnostic errors and improve the quality of diagnosis, it is very important to quantify inter-rater variability, to investigate factors affecting the diagnostic accuracy, an to reduce the inter-rater variability over time. This paper focuses on a subjective binary decision process. A hierarchical model linking data on rater opinions with patient disease-development outcomes is proposed. The model allows for the quantification of patient-specific disease severity and rater-specific bias and diagnostic ability. The model can be used in an ongoing setting in a variety of ways, including calibration of rater opinions (estimation of the probability of disease development given opinions) and quantification of rater-specific sensitivities and specificities. Bayesian computational algorithm is developed. An extensive simulation study is conducted to evaluate the proposed method, and the proposed method is illustrated by a mammogram data set.

Friday, November 21, 2014 at 11:00am-12:00noon, Room Fretwell 120

Title: Ignatov's Theorem
By Professor Isaac Sonin
Deptartment of Mathematics and Statistics, University of North Carolina Charlotte

Ignatov’s Theorem is one of the most remarkable theorems in Probability and Statistics with numerous applications. In this talk I am going to present an elementary proof accessible to anyone familiar with the concept of conditional probability.