Mike Sachs
Early detection and prediction of disease are important clinical goals. New and cheaper technologies such as MRI imaging and SNP chips provide a vast array of data that can be used for classification of disease status. It is therefore of interest to estimate a linear combination of several biomarkers that best discriminates between diseased and healthy. The linear combination that maximizes the area under the ROC curve has been shown to be more robust and more efficient than that which maximizes the logistic likelihood. If there are many biomarkers of interest, especially if there are more markers than subjects in the sample, then some model selection and regularization is needed. In this talk I will motivate and present and estimate that maximizes the penalized area under the curve and present some preliminary results of a simulation study.