Skip to main content
Explore URMC


2019 Colloquia

Should We Model X in High-Dimensional Inference?

Lucas Janson, Ph.D.
Harvard University

Many important scientific questions are about the relationship between a response variable Y and a set of explanatory variables X, for instance, Y might be a disease state and the X's might be a person's SNPs, and the question is which of these SNPs are related to the disease. For answering such questions, most statistical methods focus their assumptions on the conditional distribution of Y given X (or Y | X for short). I will describe some benefits of shifting those assumptions from the conditional distribution Y | X to the joint distribution of X, especially for high-dimensional data. First, modeling X can lead to assumptions that are more realistic and verifiable. Second, there are substantial methodological payoffs in terms of much greater flexibility in the tools an analyst can bring to bear on their data while also being guaranteed exact (non-asymptotic) inference. I will briefly mention some of my recent and ongoing work on methods for high-dimensional inference that model X instead of Y, as well as some challenges and interesting directions for the future.

Thursday, February 28, 2019
3:30 p.m. - 4:45 p.m.
Helen Wood Hall, Room 1W-501

Parallel Markov Chain Monte Carlo Methods for Bayesian Analysis of Big Data

Erin Conlon, Ph.D.
University of Massachusetts

Recently, new parallel Markov chain Monte Carlo (MCMC) methods have been developed for massive data sets that are too large for traditional statistical analysis. These methods partition big data sets into subsets, and implement parallel Bayesian MCMC computation independently on the subsets. The posterior MCMC samples from the subsets are then joined to approximate the full data posterior distributions. Current strategies for combining the subset samples include averaging, weighted averaging and kernel smoothing approaches. Here, I will discuss our new method for combining subset MCMC samples that directly products the subset densities.

While our method is applicable for both Gaussian and non-Gaussian posteriors, we show in simulation studies that our method outperforms existing methods when the posteriors are non-Gaussian. I will also discuss computational tools we have developed for carrying out parallel MCMC computing in Bayesian analysis of big data.

Thursday, February 14, 2019
3:30 p.m. - 4:45 p.m.
Helen Wood Hall, Room 1W-501