Jeffrey Leek
The goal of most high-dimensional molecular biology experiments is to
rank features (e.g., gene or protein expression levels) according to
some signal of interest. However, given the number of factors that
influence expression regulation, it is not surprising that often more
than one strong signal is present in any given high-dimensional data
set. Unmodeled sources of signal in gene expression experiments cause
dependence between genes. Leek and Storey (2007) recently proposed
surrogate variables as a flexible model for certain types of
heterogeneity and large-scale dependence in gene expression studies.
In this talk I will briefly review the theoretical/statistical
framework for surrogate variable analysis and draw a connection
between an EM algorithm under normality assumptions and the surrogate
variable estimation algorithm proposed by Leek and Storey. I will
also present some preliminary results from applying the Leek and
Storey algorithm to data from a large clinical gene expression study
of trauma.