10/31/07: Estimating ROC curves in the presence of verification bias

Michael Sachs

ROC curves are typically used to assess the accuracy of a screening test
that is measured on a continuous scale. In some cases, the gold standard
diagnosis is too expensive or unpleasant to be performed on all the subjects in a study. For example, the gold standard diagnosis for Alzheimer~Rs disease requires an autopsy, which cannot be performed on all subjects in a particular study. When the probability of getting a gold standard diagnosis depends on the screening test or other covariates, then estimates of the accuracy of the test are subject to verification bias. In my talk I will review some ROC curve methodology and explore how it extends to the case of verification bias.

10/24/07: A wavelet-based method for detecting chromosomal aberrations with array Comparative Genomic Hybridization (array-CGH) data

Xuesong Yu

Genomic instability, such as chromosome deletions and duplications, occurs
in many genetic diseases. Array-based comparative genomic hybridization
(array CGH) is a powerful technology for detecting genetic aberrations. We
propose to segment array CGH data using 2-scale product of wavelet
transforms. We design a test statistic to assess the significance of the
change points, where deletions or duplications occur, by controlling the
family-wise error rate (FWER). In addition, adjusted p-value is provided for each marker location. Simulation results and a real data example will be presented to illustrate the method.

10/17/07: Surrogate variables — intuition, estimation and application

Jeffrey Leek

The goal of most high-dimensional molecular biology experiments is to
rank features (e.g., gene or protein expression levels) according to
some signal of interest. However, given the number of factors that
influence expression regulation, it is not surprising that often more
than one strong signal is present in any given high-dimensional data
set. Unmodeled sources of signal in gene expression experiments cause
dependence between genes. Leek and Storey (2007) recently proposed
surrogate variables as a flexible model for certain types of
heterogeneity and large-scale dependence in gene expression studies.
In this talk I will briefly review the theoretical/statistical
framework for surrogate variable analysis and draw a connection
between an EM algorithm under normality assumptions and the surrogate
variable estimation algorithm proposed by Leek and Storey. I will
also present some preliminary results from applying the Leek and
Storey algorithm to data from a large clinical gene expression study
of trauma.

10/10/07: Methods to estimate The Distribution of the Failure Time under Outcome Misclassification for Current Status Data.

Giancarlo Sal y Rosas

A common study design in epidemiology and clinical
research is the follow-up study in which a fixed number of participants are
followed for a period of time in order to observe some event such as death,
disease, development of a tumor, etc. In some cases, (e.g. tumor
development, asymptomatic disease) the exact time of the event may not be
observable. Instead, the participant is tested once at some pre-determined
time and the outcome is observed to have occurred or not occurred. Such data
are referred to as current status data or type I interval censored data.

Groeneboom and Wellner (1992) proposed two methods to estimate the
cumulative distribution function of the failure times in the absence of
covariate effects. The first is based on the EM-Algorithm, which arises
naturally because we can consider current status data as an example of a
missing data problem. The second method is based on isotonic regression.
Huang (1996) extended the idea to the Cox Proportional Hazard model with
current status data.

We extend these methods to the situation where the outcome is based on an
imperfect test (sensitivity and specificity less than one), so that we
expect some false positives and false negative outcomes in our data. In
particular, we discuss the estimation of the NPMLE using the EM algorithm
and isotonic techniques in the case of no covariate effect. We also discuss
the case of covariate effect (Proportional Hazard Model).

10/03/07: Talking Stats with Firemen: Highlights from StatCom’s first year

David Lockhart and Julian Wolfson

Statistics in the Community (StatCom) is a student-run initiative at the University of Washington which provides free statistical consulting to non-profit community and governmental organizations. Started at the UW in 2005 based on a concept developed at Purdue University, StatCom pulls on the expertise of students in Statistics, Biostatistics, Genome Sciences, and beyond. In this presentation, we will talk about how StatCom works, as well as describing some of the projects StatCom is currently involved in. Students interested in learning more about StatCom (we’re always looking for more members!) either before or after the seminar can visit http://www.stat.washington.edu/statcom.