04/30/2008: A Bayesian Decision Theoretic Framework for the Detection of Spatial Clustering of Non-Infectious Diseases

Albert Kim

Often media reports of a large number of cases of a disease within a small geographic area stoke public demand for investigation by health officials. The detection of such spatial clustering of diseases can be cast as a statistical problem for which numerous methodologies have been developed.

One popular method is the method of Kulldorff where multiple circles (whose radii are defined by population size) are superimposed onto a map of the study region. For each circle, the observed number of cases is compared to the expected number of cases, with circles with more cases than expected being labeled as potential clusters. Measures of the statistical significance of the excesses are obtained via Monte Carlo simulation under the null hypothesis. However, this method is frequentist in nature and hence suffers from drawbacks due to miscalibration of p-values and multiple testing.

04/23/2008: A semi-parametric regression approach for time-dependent ROC curve using non-parametric transformation model

Nan Hu

Receiver operating characteristic (ROC) curves are commonly used for visualizing sensitivity and specificity of a continuous biomarker (or diagnostic test result), Y, for a binary disease outcome D. In practice, however, many disease outcomes depend on time. Therefore it is appropriate to derive the corresponding ROC curves that changes as a function of time. Recently, the ROC analysis has been extended to the time-to-event outcome data, including the nonparametric propose approach by Heagerty et.al (2000), and semi-parametric approach by Heagerty and Zheng (2005). However, none of these approaches incorporate covariates, and cannot be used to estimate time-dependent ROC curves adjusted for covariates. More recently, Song and Zhou (2008) proposed a semi-parametric regression approach for covariate-specific ROC curve for time-to-event outcome, but their method has strong assumption of proportionality in hazard. In this study, I propose a new semi-parametric method for estimating the time-dependent ROC curve based on non-parametric transformation model of event time. Since the transformation model neither assumes the distribution of error term, nor requires the specification the transformation function (other than the requirement of monotonicity), the proposed approach is more general and has a large extend of flexibility in model specification.

04/16/2008: Estimating ROC curves in the presence of covariates

Bharat Rajan

Receiver operating characteristic (ROC) curves are commonly used statistical tools to study the classification accuracy of diagnostic tests for ordinal-scale rating data. ROC curves can be estimated using either a discrete or a smooth ROC curve, with latter being preferred. Several methods based on binormal ROC form have been proposed in the literature to estimate a smooth ROC curve. However, most commonly used methods do not work in the presence of covariates, fail with degenerate data, or make strong assumptions. In this paper, we propose a new semi-parametric direct regression method with a general link function to estimate a smooth ROC curve in presence of covariates that works with degenerate data. The area under the curve (AUC) is one of the well-accepted summary measure for assessing the overall accuracy of the ROC curve. Methods currently used, estimate the AUC indirectly by first estimating the parameters of the ROC curve, leading to several nuisance parameters, and loss in efficiency, with no clear interpretation in terms of the AUC. We also propose a new regression for the non-parametric AUC measure of ordinal-scale tests in presence of discrete and continuous covariates, which works even with degenerate datasets. Simulation studies using different ROC models were used to compare the new methods with existing methods in terms of integrated bias and integrated mean square error over the entire ROC curve for the ROC regression, and percent bias and mean square error for the AUC regression. The results from the simulation studies suggest that the proposed methods has smaller bias and mean square error compared to other methods, and works well in the presence of covariates and with degenerate data. The proposed method were applied to the carotid vessel study and staging of prostate cancer study to investigate the effects of covariate on the AUC values and ROC curve.

04/09/2008: Panel Discussion – Finding a Thesis Adviser/Topic

This week will feature a panel discussion and question/answer session about the sometimes intimidating process of finding a thesis adviser/topic. The event will be students-only, and home-baked brownies will be provided; don’t miss out!

04/02/2008: HIV-1 genetic diversity in the female genital tract

David Lockhart

The HIV infecting two tissues in the body may represent a single randomly mixing population or may be compartmentalized into reproductively isolated groups.  We examine local variation within 3 siteson the cervix relative to the blood of 8 women. Genetic diversity in the two compartments were compared using the pooled means test of Gilbert & Rossini. Compartmentalization was determined using the permutation method of Slatkin & Maddison. Homogeneity of viral sequences from the different cervical sites was examined using randomization tests.  We hypothesize that apparent compartmentalization occurs as a result of bursts of monoclonal replication. We defined a burst as 5 or more sequences each within a distance of .01 of each other. Identification of such bursts requires the enumeration of maximal cliques of a graph of their relationships. This enumeration is computationally intensive, but Du et al give an algorithm that works well for sparse graphs.