Andrei Yakovlev Colloquium
Our first colloquium of each academic year is named for the late Dr. Andrei Yakovlev (Chair 2002-2008) in honor of his many major contributions to the Department. The public is invited to attend.
2018 Andrei Yakovlev Colloquium
Amy Herring, PhD
Sara and Charles Ayres Professor
Department of Statistical Science and Global Health
Monday, September 17, 2018
3:30 p.m. - 5:00 p.m.
Helen Wood Hall - Room 1W510
Statistics for Science's Sake
In this talk in honor of the career of Dr. Andrei Yakovlev, we will consider two case studies of scientific problems that pose interesting statistical challenges and new methodological development. We will address the motivating scientific problems (in maternal and child health and adolescent development, respectively), drawbacks of existing or standard analysis approaches, subsequent methodological developments, and the process of collaboration in multiple disciplines, with a focus on strategies for generating ideas for research beyond graduate school and throughout one’s career.
2017 Andrei Yakovlev Colloquium
Martin Wells, PhD
Professor and Chairman
Department of Statistical Science
A Scalable Empirical Bayes Approach to Variable Selection in Generalized Linear Models
A new empirical Bayes approach to variable selection in the context of generalized linear models is developed. The proposed algorithm scales to situations in which the number of putative explanatory variables is very large, possibly much larger than the number of responses. The coefficients in the linear predictor are modeled as a three-component mixture allowing the explanatory variables to have a random positive effect on the response, a random negative effect, or no effect. A key assumption is that only a small (but unknown) fraction of the candidate variables have a non-zero effect. This assumption, in addition to treating the coefficients as random effects facilitates an approach that is computationally efficient. In particular, the number of parameters that have to be estimated is small, and remains constant regardless of the number of explanatory variables. The model parameters are estimated using a modified form of the EM algorithm which is scalable, and leads to significantly faster convergence compared with simulation-based fully Bayesian methods.
Thursday, September 14, 2017
2016 Andrei Yakovlev Colloquium
Richard J. Cook, PhD
Professor of Statistics
University of Waterloo
Augmented composite likelihood for the analysis of family data under biased sampling schemes
The heritability of chronic diseases can be effectively studied by examining the nature and extent of within-family associations in disease onset times. In such studies families are typically recruited through a biased sampling scheme in which affected individuals in a disease registry are sampled, and their relatives are contacted to provide right-censored or current status data on their disease onset times. We develop likelihood and composite likelihood methods for modeling the within-family association in these times through copula models in which dependencies are characterized by Kendall's tau. Auxiliary data from independent individuals are exploited by augmentating composite likelihoods to increase precision of marginal parameter estimates and consequently increase efficiency in dependence parameter estimation. An application to a motivating family study in psoriatic arthritis illustrates the method and provides some evidence of excessive paternal transmission of risk.
September 8, 2016
2015 Andrei Yakovlev Colloquium
Heping Zhang, PhD
Susan Dwight Bliss Professor of Public Health (Biostatistics)
Yale School of Public Health
Decision Trees for Precision Medicine
Double-blind, randomized clinical trials are the preferred approach to demonstrating the effectiveness of one treatment against another. The comparison is, however, made on the average group effects. While patients and clinicians have always struggled to understand why patients respond differently to the same treatment, and while much hope has been held for the nascent field of predictive biomarkers (e.g. genetic markers), there is still much utility in exploring whether it is possible to estimate treatment efficacy based on demographic and baseline variables including biomarkers. To address this issue, we focused on a concept of the relative effectiveness of treatments that is of particular importance in precision medicine. The method can identify groups of patients that are more likely to respond one treatment than the other, in contrast to the tradition approach that searches for a superior treatment in a larger population. We developed an automated algorithm to construct decision trees and performed extensive simulation to evaluate our algorithm. We analyzed data from clinical trials to illustrate the practical potential of our method.
Thursday, September 24, 2015
2014 Andrei Yakovlev Colloquium
Daniel Scharfstein, ScD
Professor of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Global Sensitivity Analysis for Repeated Measures Studies with Informative Dropout: A Semi-Parametric Approach
In 2010, the National Research Council issued the report: "The Prevention and Treatment of Missing Data in Clinical Trials." This report, commissioned by the FDA, provides 18 recommendations. Since inference in the presence of missing data requires untestable assumptions, Recommendation 15 states: “Sensitivity analyses should be part of the primary reporting of findings from clinical trials. Examining sensitivity to the assumptions about the missing data mechanism should be a mandatory component of reporting.” Broadly speaking, there are three main types of sensitivity analysis. Ad-hoc sensitivity analysis involves analyzing the data using a few different methods and evaluating whether the inferences are consistent. Local sensitivity analysis evaluates how inferences vary in a small neighborhood of a benchmark identification assumption, such as missing at random. Chapter 5 of the report emphasizes global sensitivity analysis, which considers how inferences vary over a much larger neighborhood of identification assumptions. In this talk, we present a global sensitivity analysis methodology for drawing inference about the mean at the final scheduled visit in a repeated measures study with dropout. We discuss a recently developed semi-parametric approach, the software for which is freely available at www.missingdatamatters.org. We present a detailed case study to illustrate the methodology.
Thursday, September 18, 2014
2013 Andrei Yakovlev Colloquium
David Ruppert, PhD
School of Operations Research and Information Engineering
Fast Covariance Estimation for High-Dimensional Functional Data
High dimensional functional data are becoming increasingly common, for example, in medical imaging. For such data, we propose fast methods for smooth estimation of the covariance function. These methods scale up linearly with J, the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension J greater than 500; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions, such as J = 10, 000. We introduce two new methods that circumvent this problem: 1) an extremely fast implementation of the sandwich smoother for covariance smoothing; and 2) a two-step procedure that first obtains the singular value decomposition of the data matrix and then smooths the eigenvectors. In high dimensions, these new approaches are at least an order of magnitude faster than standard methods and drastically reduce memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension J = 10,000 and very fast (< 10 minutes) smoothing for J = 100, 000.
This is joint work with Luo Xiao, Ciprian Crainiceanu, and Vadim Zippunikov.
Thursday, September 19, 2013
2012 Andrei Yakovlev Colloquium
Ying Kuen K. Cheung, PhD
Mailman School of Public Health
On the Efficiency of Nonparametric Variance Estimation in Sequential Dose Finding
Phase I clinical trials are experiments in which a drug is administered to humans to determine the maximum tolerated dose, defined as the maximum test dose that causes a toxicity with a target probability. As such, phase I dose-finding is often formulated as a quantile estimation problem. In this talk, I will focus on clinical scenarios where toxicity is defined by dichotomizing a continuous outcome, for which a correct specification of the variance function of the outcomes is important. This is especially true for sequential study where the variance assumption directly involves in the generation of the design points and hence sensitivity analysis may not be feasible after the data are collected. In this light, there is a strong reason for avoiding parametric assumptions on the variance function, although this may incur efficiency loss. This talk will show how much information one may retrieve by making additional parametric assumptions on the variance in the context of a sequential least squares recursion. By asymptotic comparison and simulation study, we demonstrate that assuming homoscedasticity achieves only a modest efficiency gain when compared to nonparametric variance estimation: when homoscedasticity in truth holds, the latter is at worst 88% as efficient as the former in the limiting case, and often achieves well over 90% efficiency for most practical situations.
Thursday, September 6, 2012
2011 Andrei Yakovlev Colloquium
Xihong Lin, PhD
Department of Biostatistics
Harvard School of Public Health
Statistical Issues and Challenges in Analyzing High-throughput 'Omics Data in Population-Based Studies
With the advance of biotechnology, massive "omics" data, such as genomic and proteomic data, become rapidly available in population based studies to study the interplay of genes and environment in causing human diseases. An increasing challenge is how to design such studies, manage the data, analyze such high-throughput "omics" data, interpret the results, and make the findings reproducible. We discuss several statistical issues in analysis of high-dimensional "omics" data in population based "omics" studies. We present statistical methods for analysis of several types of "omics" data, including incorporation of biological structures in analysis of data from genome-wide association studies, and next generation sequencing data for rare variants. Data examples are presented to illustrate the methods. Strategies for interdisciplinary training in statistical genetics, computational biology and genetic epidemiology will also be discussed.
Thursday, September 29, 2011
2010 Andrei Yakovlev Colloquium
Michael R. Kosorok, PhD
University of North Carolina at Chapel Hill
Reinforcement Learning, Clinical Trials and Personalized Medicine
In this talk, we discuss using reinforcement learning to discover optimal dynamic treatment regimes for treating cancer and other life-threatening diseases. The approach we propose is to use a specially designed sequence of two randomized clinical trials that enables discovery and validation of these optimal regimens. Because these regimens are optimized over patient characteristics, including biomarkers, they are a form of personalized medicine. We discuss applications in non-small cell lung cancer, colorectal cancer and cystic fibrosis. We will also discuss briefly several open technical questions.
Thursday, September 9, 2010
2009 Andrei Yakovlev Colloquium
Dean Follmann, PhD
Biostatistics Research Branch
National Institute of Allergy and Infectious Diseases
Crossover Trials for Survival and Recurrent Event Endpoints
The crossover is a popular and efficient trial design used in the context of patient heterogeneity to assess the effect of treatments that act relatively quickly and whose benefit disappears with discontinuation. Each patient can serve as her own control as within-individual treatment and placebo responses are compared. Conventional wisdom is that these designs are not appropriate for absorbing binary endpoints, such as death or HIV infection. We explore the use of crossover designs in the context of these non-repeatable binary endpoints and show that they can be more efficient than the standard parallel group design when there is heterogeneity in individuals’ risks. We also introduce a new two-period design where first period “survivors” are re-randomized for the second period. This design combines the crossover design with the parallel design and achieves some of the efficiency advantages of the crossover design while ensuring that the second period groups are comparable by randomization.
We discuss the validity of the new designs and evaluate mixture model and semi-parametric methods of inference. We extend our results to cross-over trials with recurrent events. Simulations are used to compare the different designs and examples are provided to explore practical issues in implementation.
Thursday, September 17, 2009
2008 Andrei Yakovlev Colloquium
Yi Li, PhD
Department of Biostatistics
Harvard University, Dana-Farber Cancer Institute
Detecting Disparities in Long-term Cancer Survivals: Challenges and Possible Solutions
This talk deals with long-term disease-specific survivals among the prostate cancer patients in the NIH Surveillance Epidemiology and End Results (SEER) program, wherein the main endpoint (e.g. deaths from prostate cancer) and the censoring causes (e.g. deaths from heart diseases) may be dependent. While a number of authors have studied the mixture survival model to analyze survival data with non-negligible long-term survival fractions, none has studied the mixture model in the presence of dependent censoring. To account for such dependence, we propose a more general long-term survival model that allows for dependent censoring. We derive the models from the perspective of competing risks and model the dependence between the censoring time and the survival time using a class Archimedean copula models. Within this framework, we consider the parameter estimation, the long-term survival detection, and the two-sample comparison of latency distributions in the presence of dependent censoring when a proportion of patients is deemed to be long-term survivors. Large sample results using the martingale theory are obtained. We examine the finite sample performance of the proposed methods via simulation and apply them to analyze the SEER prostate cancer data.
Thursday, September 18, 2008