Testing Sparsity-Inducing Penalties
Maryclare Griffin, PhD
Many penalized maximum likelihood estimators correspond to posterior mode estimators under specfic prior distributions. Appropriateness of a particular class of penalty functions can therefore be interpreted as the appropriateness of a prior for the parameters. For example, the appropriateness of a lasso penalty for regression coefficients depends on the extent to which the empirical distribution of the regression coefficients resembles a Laplace distribution. We give a testing procedure of whether or not a Laplace prior is appropriate and accordingly, whether or not using a lasso penalized estimate is appropriate. This testing procedure is designed to have power against exponential power priors which correspond to l_q penalties. Via simulations, we show that this testing procedure achieves the desired level and has enough power to detect violations of the Laplace assumption when the numbers of observations and unknown regression coecients are large. We then introduce an adaptive procedure that chooses a more appropriate prior and corresponding penalty from the class of exponential power priors when the null hypothesis is rejected. We show that this can improve estimation of the regression coecients both when they are drawn from an exponential power distribution and when they are drawn from a spike-and-slab distribution.
Thursday, October 18, 2018
3:30 p.m. - 4:45 p.m.
Helen Wood Hall, Room 1W-501
Toward Automated Efficient Estimation in Nonparametric and Semiparametric Models
Marco Carone, PhD
University of Washington
Drawing efficient inference in the context of nonparametric and semiparametric models can be challenging. General constructive approaches exist, but most of these build upon knowledge of the efficient influence function, an object whose analytic derivation is not in the skillset of most practitioners. This may have constituted a barrier to a broader use of these models in practice. In this talk, a novel approach to deriving the efficient influence function will be discussed. This proposal allows the use of computational tools as a substitute for some of the theoretical effort typically required. As such, it may facilitate the automation of efficient estimation in nonparametric and semiparametric models.
Thursday, October 4, 2018
Statistics for Science's Sake
Amy Herring, PhD
2018 Andrei Yakovlev Colloquium
Monday, September 17, 2018
3:30 p.m. - 5:00 p.m.
Helen Wood Hall - Room 1W510
Multi-State Models in Medical Research
Per Kragh Andersen, PhD
University of Copenhagen
2018 Charles L. Odoroff Memorial Lecture
Thursday, May 10, 2018
Revisiting the Genome Wide Threshold of 5 X 10-8 in 2018
Bhramar Mukherjee, Ph.D.
University of Michigan
During the past two years, there has been much discussion and debate around the perverse use of the P-value threshold of 0.05 to declare statistical significance for single null hypothesis testing. A recent recommendation by many eminent statisticians is to redefine statistical significance at P<0.005 [Benjamin et al, Nature Human Behavior, 2017]. This new threshold is motivated by the use of Bayes Factors and true control of false positive report probability. In genome wide association studies, a much smaller threshold of 5 x 10-8 has been used with notable success in yielding reproducible results while testing millions of genetic variants. I will first discuss the historic rationale for using this threshold. We will then investigate whether this threshold that was proposed about a decade ago needs to be revisited with the current genome wide data we have in terms of the newer sequencing platforms, imputation strategies, testing rare versus common variants, the existing knowledge we have gathered regarding true association signals, or for controlling other metrics associated with multiple hypotheses testing beyond the family wise error rate. I will discuss notions of Bayesian error rates for multiple testing and use connections between the Bayes Factor and the Frequentist Factor (the ratio of power and Type 1 error) for declaring new discoveries. Empirical studies using data from the Global Lipids Consortium will be used to evaluate if we applied various thresholds/decision rules in 2008 or 2009, how many of the most recent GWAS results (in 2013) would we detect and what would be our “true” false discovery rate. This is joint work with Zhongsheng Chen and Michael Boehnke at the University of Michigan.
Friday, April 20, 2018
Does DNA-Methylation Mediate the Effect of Maternal Smoking on Birth Weight? Exposure Misclassification in Mediation Analyses for Environmental Epigenetic Studies
Linda Valeri, Ph.D.
McLean's Psychiatric Biostatistics Laboratory
Assessing whether epigenetic alterations mediate associations between environmental exposures and health outcomes is increasingly popular. We investigate the impact of exposure misclassification in such investigations. We quantify bias and false positive rates due to exposure misclassification in mediation analysis and assess the performance of SIMEX correction approach. Further, we evaluate whether DNA-methylation mediates smoking-birth weight relationship in the MoBa birth cohort. Ignoring exposure misclassification increases Type I error in mediation analysis. The direct effect is underestimated and, when the mediator is a biomarker of the exposure, as is true for smoking, the indirect effect is overestimated. Misclassification correction plus cautious interpretation are recommended for mediation analyses in the presence of exposure misclassification.
Thursday, April 12, 2018
A Model for the Regulation of Follicular Dendritic Cells Predicts Invariant Reciprocal-Time Decay of Post-Vaccine Antibody Response
Anthony Almudevar, Ph.D.
University of Rochester
Follicular dendritic cells (FDC) play a crucial role in the regulation of humoral immunity. They are believed to be responsible for long-term persistence of antibody, due to their role in antibody response induction and their ability to retain antigen in immunogenic form for long periods. In this talk, a regulatory control model is described which links persistence of humoral immunity with cellular processes associated with FDCs (Almudevar 2017, Immunology and Cell Biology). The argument comprises three elements. The first is a review of population-level studies of post-vaccination antibody persistence. It is found that reciprocal-time (= 1/t) decay of antibody levels is widely reported, over a range of ages, observation times and vaccine types. The second element is a mathematical control model for cell population decay for which reciprocal-time decay is a stable attractor. Additionally, control effectors are easily identified, leading to models of homeostatic control of the reciprocal-time decay rate. The final element is a review of known FDC functionality. This reveals a striking concordance between cell properties required by the model and those widely observed of FDCs, some of which are unique to this cell type. The proposed model is able to unify a wide range of disparate observations of FDC function under one regulatory principle, and to characterize precisely forms of FDC regulation and dysregulation. Many infectious and immunological diseases are increasingly being linked to FDC regulation, therefore a precise understanding of the underlying mechanisms would be of significant benefit for the development of new therapies.
Thursday, March 22, 2018
Clustering Three-Way Data Using Mixture Models
Paul McNicholas, Ph.D.
Clustering is the process of finding underlying group structures in data. Although mixture model-based clustering is firmly established in the multivariate case, there is a relative paucity of work on matrix variate distributions. Several mixtures of matrix variate distributions are discussed, along with some details on parameter estimation. Real and simulated data are used for illustration, and some suggestions for future work are discussed.
Thursday, March 1, 2018