| October 4, 2006 | |
| Title: | Estimating Lifetime Medical Costs Under a Gamma Copula Model |
| Speaker: | Kristin Berry |
| Abstract: | The analysis of lifetime medical costs with censored data presents several statistical challenges. The assumption of independent censoring may be valid on the time scale, but is not reasonable on the cost scale. The censoring pattern on the cost scale is thus typically induced to be dependent. Of more concern is the fact that the cost distribution is potentially nowhere identifiable in a parametric setting due to this censoring. Methods to date have avoided this problem by estimating nonparametric time-restricted costs or the joint distribution of costs and survival. To remedy this, I propose a semi-parametric gamma copula model which estimates the marginal lifetime medical cost distribution. This model assumes a Clayton’s copula structure using the Laplace transform of a gamma distribution. I will discuss the asymptotic local consistency of the estimator. I will also present the results of this method as applied to both simulated and real data. |
|
|
|
| October 11, 2006 | |
| Title: | Association of HLA allele frequency and HIV-1 Divergence |
| Speaker: | David Lockhart |
| Abstract: | Human leukocyte antigen genes (HLA) have been shown to influence the evolution of HIV-1 within populations, presumably by driving CTL escape mutations. Rare HLA alleles may have a selective advantage because most viruses will have evolved in relatively greater response to common HLA alleles and may require reversion mutations in addition to escape mutations to adapt to a rare allele. The goal of this study was to test the hypothesis that HIV-1 divergence was greater among those with rare HLA alleles versus those with common alleles. Maximum likelihood trees were generated for 9 HIV gene sequences available for 267 of 1119 HLA (Class I; A, B, C) genotyped individuals from Durban, SA. When only 2-digit genotypes were resolved (n=405), probabilities of the 4-digit genotypes were inferred using a linkage disequilibrium model based on the HLAs of the 714 fully resolved individuals. The population allele frequencies were calculated using these predicted probabilities as weights. We created 10 datasets with any subjects with missing HLA genotype information imputed according to the probabilities predicted by the LD model. Each of these datasets was doubled, exchanging the arbitrary labels on the two copies for each gene in the added copy, to produce a dataset where the allele rather than the individual is the unit of analysis and constrain parameter coefficients of, for example, “allele A1″ and “allele A2″ to be the same. The association between external branch length and allele frequency was estimated using GEE to account for the correlation between the branch lengths of the two alleles within an individual and adjusting for year, cohort, CD4 count and the allele frequencies of the other 5 alleles possessed by that individual. The parameter coefficients were averaged over the 10 datasets and tested using the t-test procedure developed by Rubin et al.
This analysis is fairly complex and entails a whirlwind tour through large swaths of modern statistics. My goal is to give quick but accessible descriptions of the basic ideas for each and discuss the issues that arise with each in this particular application. Also, my own role in this project has been the imputation and GEE modeling, so I will be able to say more on this aspect and less on the tree fitting and LD model which were handled by others. |
|
|
|
| October 18, 2006 | |
| Title: | A marginal structural model for the effect of breastfeeding on CD4 count in HIV+ African mothers |
| Speaker: | Rob Wellman |
| Abstract: | Breastfeeding infants in countries where HIV-1 prevalence is high is a contentious issue. It is known that breastfeeding has a positive impact on infant morbidity and mortality, but the risk of mother-to-child transmission of HIV is substantial. Additionally, a randomized clinical trial conducted in Nairobi, Kenya assessing the rate of mother-to-child transmission from breastfeeding suggested that breastfeeding may increase the risk of maternal mortality. Data from several observational studies have failed to confirm such an association. Utilizing observational data from the Breastfeeding and Maternal Health Study, we use a marginal structural model (MSM) with inverse probability of stopping breastfeeding (treatment) and censoring weights to estimate the unconfounded “causal” effect of breastfeeding on the health of HIV+ African mothers as measured by the rate of CD4 cell decline. An overview of the theory of MSMs will be presented, as well as a discussion of issues relevant to the practical application of MSMs; such as modeling and calculating weights, dealing with intermittent missing data and calculating appropriate standard errors for the model parameters. Advisor: Elizabeth Brown. |
|
|
|
| October 25, 2006 | |
| Title: | Capturing heterogeneity in gene expression studies by “Surrogate Variable Analysis” |
| Speaker: | Jeff Leek |
| Abstract: | It has unambiguously been shown through the application of DNA microarrays that genetic and environmental factors may have widespread effects on gene expression levels. In many studies, these factors will be unknown, unmeasured or too complicated to capture through simple models. While a large body of methods exist for characterizing trends in gene expression with respect to a measured variable of interest, none have attempted to directly identify, estimate and utilize the unmeasured factors that cause heterogeneity on a large scale. Here, we introduce “surrogate variable analysis” (SVA) to accomplish this unmet need. SVA can be applied in conjunction with standard analysis techniques to clarify the relationship between expression and any measured variable of interest. We apply SVA to disease class, timecourse and genetics of gene expression studies. We show that SVA increases biological clarity and accuracy in genome-wide expression studies where heterogeneity is present. |
|
|
|
| November 1, 2006 | |
| Title: | Evaluating the estimators of the derivative of ROC curves |
| Speaker: | Bharat Rajan |
| Abstract: | ROC curves are the most widely used statistical method for evaluating continuous markers. However, the statistical techniques for marker studies are not very well developed. The derivative of the ROC curve plays an important role in making inference about the ROC curve. The derivative of the ROC curve is needed for computing the variance of ROC curves, sampling size calculations in phase-2 marker studies, and optimal choice of case-control ratio in marker studies. The derivative of the ROC curve can be estimated using non-parametric and semi-parametric estimators. I would present several estimators of the derivative of the ROC curve, compare these estimators, and evaluate the performance of these estimators in terms of bias and efficiency in simulation studies. I would also apply these estimators to the pancreatic cancer study data. |
|
|
|
| November 8, 2006 | |
| Title: | Methods for comparing dynamic treatment regimes |
| Speaker: | Cecilia Cotton |
| Abstract: | In certain situations observational data may be all that is available to compare dynamic treatment regimes. However, it is unlikely that subjects will have been fully compliant with any regime. If subjects are artificially censored when they break their regime then inverse probably weights (IPW) can be use to account for the induced selection bias. I will present simulation results for estimating Kaplan Meier survival curves for a single treatment regime and discuss methods to be considered when comparing multiple treatment regimes. |
|
|
|
| November 15, 2006 | |
| Title: | A censored multinomial regression model with application to perinatal mother to child transmission of HIV |
| Speaker: | Charlotte Gard |
| Abstract: | We are often interested in estimating rates of perinatal mother to child transmission of HIV when data are collected at multiple points in time and infection status for some infants at some time points is unknown. Logistic regression and Cox proportional hazards regression are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here, we propose using censored multinomial regression models to estimate cumulative and conditional rates of HIV transmission using both logit and complementary log-log links. Through simulation, we show that the proposed methods perform better than standard methods in terms of bias, mean square error, coverage probability, and power under a range of treatment effect and visit scenarios. |
|
|
|
| November 29, 2006 | |
| Title: | Bayesian adjustments for exposure misclassification |
| Speaker: | Betsy Teeple |
| Abstract: | To measure association between an outcome variable and an exposure variable, we can compute the odds ratio from a simple 2×2 table. However, when the exposure variable is subject to misclassification, the odds ratio estimate will be biased, and under mild conditions, biased toward a null result. There are frequentist and Bayesian approaches to adjust for misclassification, and thus, remove the bias. In this talk, I will demonstrate the effect of misclassification and discuss some frequentist and Bayesian methods to correct for it. I will focus mainly on Bayesian methods, which involve using the specificity and sensitivity of the exposure variable as prior information. In particular, incorporating this information into the model has two counter-intuitive effects on uncertainty around the estimated association – evidence weaking and uncertainty inversion. Through simulations, I will demonstrate how these interesting phenomena come about and why our intuition can lead us astray. This talk is based on a 2006 Statistics in Medicine paper by Paul Gustafson and Sander Greenland, titled “Curious phenomena in Bayesian adjustment for exposure misclassification.” |
|
|
|
| December 6, 2006 | |
| Title: | Do Time-Series Studies Provide Meaningful Personal Health Effect Estimates for Ozone? |
| Speaker: | Rob Schmicker |
| Abstract: | Ozone is a highly reactive atmospheric pollutant that epidemiological and controlled exposure studies have shown to be associated with adverse health effects ranging from decreased lung capacity to mortality. Because ambient concentration is monitored routinely and is the basis for federal regulation, ambient concentration and not individual exposure is the target of most studies. Time-series studies are a common design that use population level concentration and outcome as a substitute for individual exposure and outcome. We are interested in understanding situations in which estimates from time-series studies provide meaningful and unbiased estimation of personal health effects. In this talk, I will discuss personal health estimation through the use of true models, design and parameterization. I will also discuss simulation studies using assumptions gained from the EPA Air Pollutants Exposure Model (APEX). |