Our first colloquium of each academic year is named for the late Dr. Andrei Yakovlev (Chair 2002-2008) in honor of his many major contributions to the Department.
Fast Covariance Estimation for High-Dimensional Functional Data
High dimensional functional data are becoming increasingly common, for example, in medical imaging. For such data, we propose fast methods for smooth estimation of the covariance function. These methods scale up linearly with J, the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension J greater than 500; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions, such as J = 10, 000. We introduce two new methods that circumvent this problem: 1) an extremely fast implementation of the sandwich smoother for covariance smoothing; and 2) a two-step procedure that first obtains the singular value decomposition of the data matrix and then smooths the eigenvectors. In high dimensions, these new approaches are at least an order of magnitude faster than standard methods and drastically reduce memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension J = 10,000 and very fast (< 10 minutes) smoothing for J = 100, 000.
This is joint work with Luo Xiao, Ciprian Crainiceanu, and Vadim Zippunikov.
September 19, 2013
Helen Wood Hall - 1W502 Classroom
Ying Kuen K. Cheung, PhD
Mailman School of Public Health
On the Efficiency of Nonparametric Variance Estimation in Sequential Dose Finding
Phase I clinical trials are experiments in which a drug is administered to humans to determine the maximum tolerated dose, defined as the maximum test dose that causes a toxicity with a target probability. As such, phase I dose-finding is often formulated as a quantile estimation problem. In this talk, I will focus on clinical scenarios where toxicity is defined by dichotomizing a continuous outcome, for which a correct specification of the variance function of the outcomes is important. This is especially true for sequential study where the variance assumption directly involves in the generation of the design points and hence sensitivity analysis may not be feasible after the data are collected. In this light, there is a strong reason for avoiding parametric assumptions on the variance function, although this may incur efficiency loss. This talk will show how much information one may retrieve by making additional parametric assumptions on the variance in the context of a sequential least squares recursion. By asymptotic comparison and simulation study, we demonstrate that assuming homoscedasticity achieves only a modest efficiency gain when compared to nonparametric variance estimation: when homoscedasticity in truth holds, the latter is at worst 88% as efficient as the former in the limiting case, and often achieves well over 90% efficiency for most practical situations.
Thursday, September 6, 2012
Xihong Lin, PhD
Department of Biostatistics
Harvard School of Public Health
Statistical Issues and Challenges in Analyzing High-throughput 'Omics Data in Population-Based Studies
With the advance of biotechnology, massive "omics" data, such as genomic and proteomic data, become rapidly available in population based studies to study the interplay of genes and environment in causing human diseases.An increasing challenge is how to design such studies, manage the data, analyze such high-throughput "omics" data, interpret the results, and make the findings reproducible.We discuss several statistical issues in analysis of high-dimensional "omics" data in population based "omics" studies.We present statistical methods for analysis of several types of "omics" data, including incorporation of biological structures in analysis of data from genome-wide association studies, and next generation sequencing data for rare variants.Data examples are presented to illustrate the methods.Strategies for interdisciplinary training in statistical genetics, computational biology and genetic epidemiology will also be discussed.
Thursday, September 29, 2011
Michael R. Kosorok, PhD
University of North Carolina at Chapel Hill
Reinforcement Learning, Clinical Trials and Personalized Medicine
In this talk, we discuss using reinforcement learning to discover optimal dynamic treatment regimes for treating cancer and other life-threatening diseases. The approach we propose is to use a specially designed sequence of two randomized clinical trials that enables discovery and validation of these optimal regimens. Because these regimens are optimized over patient characteristics, including biomarkers, they are a form of personalized medicine. We discuss applications in non-small cell lung cancer, colorectal cancer and cystic fibrosis. We will also discuss briefly several open technical questions.
Thursday, September 9, 2010
Dean Follmann, PhD
Biostatistics Research Branch
National Institute of Allergy and Infectious Diseases
Crossover Trials for Survival and Recurrent Event Endpoints
The crossover is a popular and efficient trial design used in the context of patient heterogeneity to assess the effect of treatments that act relatively quickly and whose benefit disappears with discontinuation. Each patient can serve as her own control as within-individual treatment and placebo responses are compared. Conventional wisdom is that these designs are not appropriate for absorbing binary endpoints, such as death or HIV infection. We explore the use of crossover designs in the context of these non-repeatable binary endpoints and show that they can be more efficient than the standard parallel group design when there is heterogeneity in individuals’ risks. We also introduce a new two-period design where first period “survivors” are re-randomized for the second period. This design combines the crossover design with the parallel design and achieves some of the efficiency advantages of the crossover design while ensuring that the second period groups are comparable by randomization.
We discuss the validity of the new designs and evaluate mixture model and semi-parametric methods of inference. We extend our results to cross-over trials with recurrent events. Simulations are used to compare the different designs and examples are provided to explore practical issues in implementation.
Thursday, September 17, 2009
Yi Li, PhD
Department of Biostatistics
Harvard University, Dana-Farber Cancer Institute
Detecting Disparities in Long-term Cancer Survivals: Challenges and Possible Solutions
This talk deals with long-term disease-specific survivals among the prostate cancer patients in the NIH Surveillance Epidemiology and End Results (SEER) program, wherein the main endpoint (e.g. deaths from prostate cancer) and the censoring causes (e.g. deaths from heart diseases) may be dependent. While a number of authors have studied the mixture survival model to analyze survival data with non-negligible long-term survival fractions, none has studied the mixture model in the presence of dependent censoring. To account for such dependence, we propose a more general long-term survival model that allows for dependent censoring. We derive the models from the perspective of competing risks and model the dependence between the censoring time and the survival time using a class Archimedean copula models. Within this framework, we consider the parameter estimation, the long-term survival detection, and the two-sample comparison of latency distributions in the presence of dependent censoring when a proportion of patients is deemed to be long-term survivors. Large sample results using the martingale theory are obtained. We examine the finite sample performance of the proposed methods via simulation and apply them to analyze the SEER prostate cancer data.
Thursday, September 18, 2008