Dimension Reduction, Non-Gaussian Component Analysis, and Data Integration
Benjamin Risk, Ph.D.
Independent component analysis (ICA) is popular in many applications, particularly in neuroimaging where it is used to estimate resting-state networks and MRI artifacts. In this talk, we will first discuss a related method, called non-Gaussian component analysis, and secondly introduce a novel extension to integrate multiple datasets. In the analysis of an imaging modality, principal component analysis is typically used for dimension reduction prior to ICA, which could remove important information. We develop linear non-Gaussian component analysis in which dimension reduction and latent variable estimation are achieved simultaneously. In the second part of this talk, we introduce a method for extracting information shared by multiple datasets (e.g., imaging modalities) collected on the same subjects. Classical approaches to data integration utilize transformations that maximize covariance or correlation. We introduce joint and individual non-Gaussian component analysis (JIN). We focus on information shared in subject score subspaces estimated by maximizing non-Gaussianity, and we also examine information unique to each data set. We apply our methods to data from the Human Connectome Project.
Thursday, November 29, 2018
3:30 p.m. - 4:45 p.m.
Helen Wood Hall, Room 1W-501
Registration for Exponential Family Functional Data
Jeff Goldsmith, Ph.D.
We introduce a novel method for separating amplitude and phase variability in exponential family functional data. Our method alternates between two steps: the first uses generalized functional principal components analysis (GFPCA) to calculate template functions, and the second estimates smooth warping functions that map observed curves to templates. Existing approaches to registration have primarily focused on continuous functional observations, and the few approaches for discrete functional data require a pre-smoothing step; these methods are frequently computationally intensive. In contrast, we focus on the likelihood of the observed data and avoid the need for preprocessing, and we implement both steps of our algorithm in a computationally efficient way. Our motivation comes from the Baltimore Longitudinal Study on Aging, in which accelerometer data provides valuable insights into the timing of sedentary behavior. We analyze binary functional data with observations each minute over 24 hours for 592 participants, where values represent activity and inactivity. Diurnal patterns of activity are obscured due to misalignment in the original data but are clear after curves are aligned. Simulations designed to mimic the application outperform competing approaches in terms of estimation accuracy and computational efficiency. Code for our method and simulations is publicly available.
Thursday, November 15, 2018
A Bayesian Nonparametric Approach for Causal Inference with Semi-Competing Risks
Mike Daniels, Sc.D.
University of Florida
We develop a Bayesian nonparametric (BNP) model to assess the treatment effect in semi-competing risks, where a nonterminal event may be censored by a terminal event, but not vice versa. Semi-competing risks are common in brain cancer trials with death being censored by cerebellar progression. We propose a flexible BNP approach to model the joint distribution of progression and death events, thereby effectively inferring the marginal distributions of progression time and death time, characterizing within-subject dependence structure, predicting the progression and death times given a patient’s covariate, and quantifying uncertainties of all estimates. More importantly, we define a causal effect of treatment, which can be estimated from the data and has a nice causal interpretation. We perform extensive simulation studies to evaluate the proposed BNP model. The simulations show that the proposed model can accurately estimate the treatment effect in semi-competing risks setup. We also implement the proposed BNP model on data from a brain cancer Phase II trial. Joint work with Yanxun Xu and Dan Scharfstein (Johns Hopkins) and Peter Mueller (UT-Austin).
Thursday, November 1, 2018
Testing Sparsity-Inducing Penalties
Maryclare Griffin, Ph.D.
Many penalized maximum likelihood estimators correspond to posterior mode estimators under specific prior distributions. Appropriateness of a particular class of penalty functions can therefore be interpreted as the appropriateness of a prior for the parameters. For example, the appropriateness of a lasso penalty for regression coefficients depends on the extent to which the empirical distribution of the regression coefficients resembles a Laplace distribution. We give a testing procedure of whether or not a Laplace prior is appropriate and accordingly, whether or not using a lasso penalized estimate is appropriate. This testing procedure is designed to have power against exponential power priors which correspond to l_q penalties. Via simulations, we show that this testing procedure achieves the desired level and has enough power to detect violations of the Laplace assumption when the numbers of observations and unknown regression coecients are large. We then introduce an adaptive procedure that chooses a more appropriate prior and corresponding penalty from the class of exponential power priors when the null hypothesis is rejected. We show that this can improve estimation of the regression coecients both when they are drawn from an exponential power distribution and when they are drawn from a spike-and-slab distribution.
Thursday, October 18, 2018
Toward Automated Efficient Estimation in Nonparametric and Semiparametric Models
Marco Carone, Ph.D.
University of Washington
Drawing efficient inference in the context of nonparametric and semiparametric models can be challenging. General constructive approaches exist, but most of these build upon knowledge of the efficient influence function, an object whose analytic derivation is not in the skillset of most practitioners. This may have constituted a barrier to a broader use of these models in practice. In this talk, a novel approach to deriving the efficient influence function will be discussed. This proposal allows the use of computational tools as a substitute for some of the theoretical effort typically required. As such, it may facilitate the automation of efficient estimation in nonparametric and semiparametric models.
Thursday, October 4, 2018
Statistics for Science's Sake
Amy Herring, Ph.D.
2018 Andrei Yakovlev Colloquium
Monday, September 17, 2018
3:30 p.m. - 5:00 p.m.
Helen Wood Hall - Room 1W510
Multi-State Models in Medical Research
Per Kragh Andersen, Ph.D.
University of Copenhagen
2018 Charles L. Odoroff Memorial Lecture
Thursday, May 10, 2018
Revisiting the Genome Wide Threshold of 5 X 10-8 in 2018
Bhramar Mukherjee, Ph.D.
University of Michigan
During the past two years, there has been much discussion and debate around the perverse use of the P-value threshold of 0.05 to declare statistical significance for single null hypothesis testing. A recent recommendation by many eminent statisticians is to redefine statistical significance at P<0.005 [Benjamin et al, Nature Human Behavior, 2017]. This new threshold is motivated by the use of Bayes Factors and true control of false positive report probability. In genome wide association studies, a much smaller threshold of 5 x 10-8 has been used with notable success in yielding reproducible results while testing millions of genetic variants. I will first discuss the historic rationale for using this threshold. We will then investigate whether this threshold that was proposed about a decade ago needs to be revisited with the current genome wide data we have in terms of the newer sequencing platforms, imputation strategies, testing rare versus common variants, the existing knowledge we have gathered regarding true association signals, or for controlling other metrics associated with multiple hypotheses testing beyond the family wise error rate. I will discuss notions of Bayesian error rates for multiple testing and use connections between the Bayes Factor and the Frequentist Factor (the ratio of power and Type 1 error) for declaring new discoveries. Empirical studies using data from the Global Lipids Consortium will be used to evaluate if we applied various thresholds/decision rules in 2008 or 2009, how many of the most recent GWAS results (in 2013) would we detect and what would be our “true” false discovery rate. This is joint work with Zhongsheng Chen and Michael Boehnke at the University of Michigan.
Friday, April 20, 2018
Does DNA-Methylation Mediate the Effect of Maternal Smoking on Birth Weight? Exposure Misclassification in Mediation Analyses for Environmental Epigenetic Studies
Linda Valeri, Ph.D.
McLean's Psychiatric Biostatistics Laboratory
Assessing whether epigenetic alterations mediate associations between environmental exposures and health outcomes is increasingly popular. We investigate the impact of exposure misclassification in such investigations. We quantify bias and false positive rates due to exposure misclassification in mediation analysis and assess the performance of SIMEX correction approach. Further, we evaluate whether DNA-methylation mediates smoking-birth weight relationship in the MoBa birth cohort. Ignoring exposure misclassification increases Type I error in mediation analysis. The direct effect is underestimated and, when the mediator is a biomarker of the exposure, as is true for smoking, the indirect effect is overestimated. Misclassification correction plus cautious interpretation are recommended for mediation analyses in the presence of exposure misclassification.
Thursday, April 12, 2018
A Model for the Regulation of Follicular Dendritic Cells Predicts Invariant Reciprocal-Time Decay of Post-Vaccine Antibody Response
Anthony Almudevar, Ph.D.
University of Rochester
Follicular dendritic cells (FDC) play a crucial role in the regulation of humoral immunity. They are believed to be responsible for long-term persistence of antibody, due to their role in antibody response induction and their ability to retain antigen in immunogenic form for long periods. In this talk, a regulatory control model is described which links persistence of humoral immunity with cellular processes associated with FDCs (Almudevar 2017, Immunology and Cell Biology). The argument comprises three elements. The first is a review of population-level studies of post-vaccination antibody persistence. It is found that reciprocal-time (= 1/t) decay of antibody levels is widely reported, over a range of ages, observation times and vaccine types. The second element is a mathematical control model for cell population decay for which reciprocal-time decay is a stable attractor. Additionally, control effectors are easily identified, leading to models of homeostatic control of the reciprocal-time decay rate. The final element is a review of known FDC functionality. This reveals a striking concordance between cell properties required by the model and those widely observed of FDCs, some of which are unique to this cell type. The proposed model is able to unify a wide range of disparate observations of FDC function under one regulatory principle, and to characterize precisely forms of FDC regulation and dysregulation. Many infectious and immunological diseases are increasingly being linked to FDC regulation, therefore a precise understanding of the underlying mechanisms would be of significant benefit for the development of new therapies.
Thursday, March 22, 2018
Clustering Three-Way Data Using Mixture Models
Paul McNicholas, Ph.D.
Clustering is the process of finding underlying group structures in data. Although mixture model-based clustering is firmly established in the multivariate case, there is a relative paucity of work on matrix variate distributions. Several mixtures of matrix variate distributions are discussed, along with some details on parameter estimation. Real and simulated data are used for illustration, and some suggestions for future work are discussed.
Thursday, March 1, 2018