Skip to main content

2023 Colloquia

Transportability of Prediction Models

Jon A Steingrimsson, Ph.D.
Brown University

Prediction models are often used and/or interpreted in the context of a target population that differs from the study population from which the data used to develop the model comes from (e.g., a different health-care system or a different geographic region). In this talk, we discuss how to evaluate a prediction model in the context of the target population when outcome and covariate information are available from the study data and covariate but no outcome data are available on a sample from the target population. We provide identifiability results, estimators, and large sample properties of estimators for loss-based measures of model performance, and briefly discuss extensions to censoring and estimation of the area under the curve. Finite sample performance is evaluated using simulations and using data from a lung cancer screening trial.

Thursday, March 30, 2023
= = = = = =

Latent Factor Model for Multivariate Functional Data

Luo Xiao, Ph.D.
North Carolina State University

For multivariate functional data, a class of functional latent factor model (FLFM) is proposed, extending the traditional latent factor model for multivariate data. The proposed model uses unobserved stochastic processes to induce the dependence among the different functions, and thus, for a large number of functions, may provide a more parsimonious and interpretable characterization of the otherwise complex dependencies between the functions. Sufficient conditions are provided to establish the identifiability of the proposed model.  We shall use an application to electroencephalography (EEG) data to illustrate the unsupervised FLFM. We also apply a special case of FLFM, multivariate functional mixed model (MFMM), to jointly model multiple longitudinal biomarkers and time to event data for an AD study. Finally, we discuss a few future extensions.

Thursday, April 13, 2023
= = = = = =

Bayesian Mortality Rate Estimation: Calibration and Standardization for Public Reporting

Ed George, Ph.D.
The Wharton School of the University of Pennsylvania

Visit Charles L. Odoroff Memorial Lecture for details

Thursday, April 27, 2023
= = = = = =

Design and Inference for Enrichment Trials with a Continuous Biomarker

William F. Rosenberger, Ph.D.
George Mason University

Visit Andrei Yakovlev Colloquium for details

Thursday, September 7, 2023
= = = = = =

Stochastic Volatility Models with Informative Missingness for the Analysis of EMA Data

Robert T. Krafty, Ph.D.
Emory University

Models that treat the variance of a time series as a stochastic process, known as stochastic volatility models, have proven to be an important tool for analyzing dynamic variability.  Current methods for fitting and conducting inference on stochastic volatility models, which have been extensively studied within the context of financial time series, are limited by the assumption that any missing data are missing at random. With a recent explosion in technology to facilitate the collection of dynamic psychobehavioral data for which mechanisms underlying missing data are inherently scientifically informative, this limitation in statistical methodology also limits scientific advancement. In this talk, we discuss the first statistical methodology for modeling, fitting and conducing inference on stochastic volatility with data that are missing not at random. The approach is based upon a novel imputation method derived using Tukey's representation, which utilizes the Markovian nature of stochastic volatility models to overcome unidentifiable components often faced when modeling informative missingness in other settings. This imputation methods is combined with a new conditional particle filtering with ancestral sampling (CPF-AS) procedure that accounts for variability in imputation to formulate a complete particle Gibbs sampling scheme. The method is motivated by the analysis of ecological momentary assessment (EMA) data and its use is illustrated through an analysis of mood and affect reported through a cell phone application by an individual being monitored after surviving a suicide attempt.

Thursday, September 21, 2023
= = = = = =

Accounting for Differential Data Quality in Analyses of Data from Electronic Health Records

Rebecca Hubbard, Ph.D.
University of Pennsylvania

Using data generated as a by-product of digital interactions to improve health and healthcare is a rapidly expanding frontier for clinical research in the 21st century. Big healthcare databases including electronic health records and health insurance claims data offer the opportunity to study interventions and populations that have previously been difficult or impossible to access using traditional study designs. The massive size of these databases suggests that they can be used to generate powerful clinical evidence. Indeed, such data sources have provided key information advancing our understanding of vaccine and treatment effectiveness for Covid-19. However, naïve analytic methods applied in the big data setting which features error in outcome ascertainment and biased sampling mechanisms may produce highly precise but biased estimates leading to erroneous inference. This is particularly problematic when data quality differs across sub-populations, leading to poorer quality evidence and potentially exacerbating health inequities for vulnerable populations. In this talk, I will discuss the implications of differential data quality for biased inference and algorithmic unfairness as well as statistical approaches to address these challenges. Employing approaches to ensure validity and fairness of research results based on big data is an ethical imperative, key to safeguarding health equity and the scientific evidence-base derived from these data sources.

Thursday, September 28, 2023
= = = = = =

A Spatial Inference Approach to Account for Mobility in Air Pollution Studies with Continuous Treatments

Joseph Antonelli, Ph.D.
University of Florida

We develop new methodology to improve our understanding of the causal effects of multivariate air pollution exposures on public health. Typically, exposure to air pollution for an individual is measured at their home geographic region, though people travel to different regions with potentially different levels of air pollution. To account for this, we incorporate estimates of the mobility of individuals from cell phone mobility data to get an improved estimate of their exposure to air pollution. We treat this as an interference problem, where individuals in one geographic region can be affected by exposures in other regions due to mobility into those areas. We propose policy-relevant estimands and derive expressions showing the extent of bias one would obtain by ignoring this mobility. We additionally highlight the benefits of the proposed interference framework relative to a measurement error framework for accounting for mobility. We develop novel estimation strategies to estimate causal effects that account for this spatial spillover utilizing flexible Bayesian methodology. Empirically we find that this leads to improved estimation of the causal effects of air pollution exposures over analyses that ignore spatial spillover caused by mobility.

Thursday, October 12, 2023
= = = = = =

Estimating RMST Difference within Biomarker Groups with Misclassification

Anne Eaton, Ph.D.
University of Minnesota

In clinical trials where interest lies in assessing the effect of treatment on a time to event outcome, and whether the treatment effect differs by biomarker status, an imperfect biomarker test can lead to bias. In particular, since the participants who test positive (or negative) for the biomarker will actually be a mix of positive and negative patients (by gold standard), differences in the treatment effect between biomarker groups will be diluted, leading to reduced power to detect a predictive biomarker. A further complication is that if the proportional hazards assumption is satisfied for the treatment, biomarker and interaction effects, it will be violated when biomarker groups are defined based on an imperfect test. We focus on summarizing effects using the difference in restricted mean survival time (RMST) and extend an existing method to account for biomarker misclassification to the RMST setting. Our proposed estimators are unbiased and can be used to test for treatment, biomarker and interaction effects. We provide software to implement the estimators and to estimate the required sample size for trials where an imperfect biomarker test will be used.

Thursday, November 9, 2023
3:30 pm - 5:00 pm
Helen Wood Hall, Auditorium 1W-304