Spring 2018 Colloquia
Revisiting the Genome Wide Threshold of 5 X 10-8 in 2018
Bhramar Mukherjee, Ph.D.
University of Michigan
During the past two years, there has been much discussion and debate around the perverse use of the P-value threshold of 0.05 to declare statistical significance for single null hypothesis testing. A recent recommendation by many eminent statisticians is to redefine statistical significance at P<0.005 [Benjamin et al, Nature Human Behavior, 2017]. This new threshold is motivated by the use of Bayes Factors and true control of false positive report probability. In genome wide association studies, a much smaller threshold of 5 x 10-8 has been used with notable success in yielding reproducible results while testing millions of genetic variants. I will first discuss the historic rationale for using this threshold. We will then investigate whether this threshold that was proposed about a decade ago needs to be revisited with the current genome wide data we have in terms of the newer sequencing platforms, imputation strategies, testing rare versus common variants, the existing knowledge we have gathered regarding true association signals, or for controlling other metrics associated with multiple hypotheses testing beyond the family wise error rate. I will discuss notions of Bayesian error rates for multiple testing and use connections between the Bayes Factor and the Frequentist Factor (the ratio of power and Type 1 error) for declaring new discoveries. Empirical studies using data from the Global Lipids Consortium will be used to evaluate if we applied various thresholds/decision rules in 2008 or 2009, how many of the most recent GWAS results (in 2013) would we detect and what would be our “true” false discovery rate. This is joint work with Zhongsheng Chen and Michael Boehnke at the University of Michigan.
Friday, April 20, 2018
3:30 p.m. - 5:00 p.m.
River Campus - Wegmans Hall Room 1400
Clustering Three-Way Data Using Mixture Models
Paul McNicholas, Ph.D.
Clustering is the process of finding underlying group structures in data. Although mixture model-based clustering is firmly established in the multivariate case, there is a relative paucity of work on matrix variate distributions. Several mixtures of matrix variate distributions are discussed, along with some details on parameter estimation. Real and simulated data are used for illustration, and some suggestions for future work are discussed.
Thursday, March 1, 2018
3:30 p.m. - 4:45 p.m.
Helen Wood Hall, Auditorium 1W-304