| January 4, 2006 | |
| Title: | Edgeworth Expansion with Application to Confidence Intervals Construction |
| Speaker: | Phil Dinh, Biostatistics graduate student |
| Abstract: | Confidence intervals based on the normal theory suffer difficiency when data come from skewed family of distributions. In such cases, one can appeal to the theory of small sample asymptotics to improve the coverage accuracy. In this talk, the Edgeworth expansions are used to construct the confidence intervals for the one- and two-sample problems. The theory is also applied to the cost-effectiveness analysis framework to construct the confidence intervals for the cost-effectiveness ratio and the net health benefit. We show via simulation studies that our new intervals based on the Edgeworth expansion provide better coverage accuracy and are shorter in length compared to many existing techniques. The methods are applied to existing medical cost data. |
|
|
|
| January 11, 2006 | |
| Title: | Generalized estimating equations with large cluster sizes: Estimation of a high-dimensional working correlation models |
| Speaker: | Hyoju Chung, Biostatistics graduate student |
| Abstract: | Although analysis of correlated data is often a challenge in statistical modeling, a semi-parametric moment-based estimating equation method has been widely used when regression parameter characterizing population average is of primary interest. This study was motivated by a research question of how to choose working correlation matrix in the GEE model with large cluster sizes because choice of working correlation matrix is practically interesting only when the cluster size is not small. In this talk, I will start with a brief introduction of GEE models and point out some issues in GEE models with moderate to large cluster sizes. Then I will present empirical and theoretical results of GEE model with large cluster sizes from simulation study and large cluster asymptotics. |
|
|
|
| January 18, 2006 | |
| Title: | Assessing Healthy Volunteer Bias As a Case of Missing Data |
| Speaker: | Monica Chaudhari, Biostatistics graduate student |
| Abstract: | This talk shortly discusses two main forms of the selection bias problem and a method which in a number of cases can be used to control for this kind of bias: the Heckman’s two-step procedure. 1)In the standard case of selection bias, information on the dependent variable for part of the respondents is missing. 2)In the other version of the selection bias problem, information on the dependent variable is available for all respondents, but the distribution of respondents over categories of the independent variable we are interested in has taken place in a selective way. The aim of this talk is to describe a limited dependent variable model that can be used in situations of type 1 in order to obtain estimates that are representative of the population from which the sample was originally drawn. The model is a linear regression model corrected for sample selection. This correction is possible when (some of) the characteristics that determine whether subjects volunteer (or not) are known for all subjects, including those that did not volunteer. A questionnaire study serves as an example mainly to provide a concrete illustration of this method. Throughout, Rubin’s general theory of inference with missing data serves as an integrating framework. |
|
|
|
| January 25, 2006 | |
| Title: | Statistical Issues in Analysis of Correlated Dental Data |
| Speaker: | Brian LeRoux, Biostatistics faculty |
| Abstract: | In clinical dental research, outcomes are typically recorded on many sites within each patient’s mouth. These outcome data present a unique set of challenges for statistical analysis: 1) large cluster sizes, 2) multilevel data structures (teeth within patients, sites or surfaces within teeth), 3) complex correlation structures, 4) informative cluster sizes, 5) missing data at different levels, and 6) small number of clusters. In this talk I will discuss the impacts of these features on bias and precision of statistical analyses and present some recent methodological developments on estimating equations methods that address some of these issues. |
|
|
|
| February 1, 2006 | |
| Title: | Evaluation of Mortality Ascertainment in the National Wilms Tumor Study using the National Death Index |
| Speaker: | Cecilia Cotton, Biostatistics student |
| Abstract: | The remarkable progress in treatment of childhood cancer during the past few decades has focused attention on the health status of the growing population of survivors. Such research is hampered, however, by difficulties in maintaining active follow-up of teenage and adult survivors. The National Death Index (NDI) is a centralized registry all deaths that occurred in the United States, Puerto Rico, and the Virgin Islands since 1979. Qualified investigators can use the NDI to search for deaths among their study subjects. In this talk I will discuss the methods used in an NDI search performed by the National Wilms Tumor Study (NWTS). I will also present the search results and discuss the usefulness of the NDI to substitute or supplement follow-up in studies, such as the NWTS, focused on children and young adults. |
|
|
|
| February 8, 2006 | |
| Title: | Models for robustness to outliers |
| Speaker: | Ken Rice, Biostatistics faculty |
| Abstract: | In his seminal 1964 paper on robust analyses, Huber introduced a distribution that is centrally Normal, with Exponential tails beyond some pre-specified changepoint. On fitting this simple model for a location parameter alone, the MLE was shown to be ‘most robust’ in a certain sense. More generally, interpreted in terms of influence functions, it has provided a straightforward method for downweighting extreme, outlying data points in linear regression analyses which aim to fit the bulk of the data well. Extending the influence function approach to non-linear or hierarchical models is however far less simple, leading to an absence of this form of robustness in many areas. We therefore propose a simple location-scale family based around the heavy-tailed ‘Huber distribution’, which provides a model-based analogue of Huber’s estimation methods. We go on to show that, for simultaneously robust inference on both location and scale, standard likelihood methods applied to this family give results extremely closely related to Huber’s well-known but more ad-hoc ‘Proposal 2′.
Further justification for our empirical approach is provided by examining this fully-specified model in terms of constituent ’signal’ and ‘contaminant’ parts. These have several attractive operating characteristics which are both simply understood and of broad practical appeal. The full specification of a likelihood for the data allows simple extensions to be made for robust inference in many complex models; a selection of examples will be given. |
|
|
|
| February 15, 2006 | |
| Title: | Evaluate the predictiveness of a continuous marker in case-control design |
| Speaker: | Ying Huang, Biostatistics student |
| Abstract: | To describe the predictive capacity of a continuous biomarker, Pepe et al. (2006) proposed “predictiveness curve”,defined as a plot of risk versus population percentile of the marker, and proposed estimating a monotone increasing predictiveness curve using a flexible parametric model in cohort study. However, case-control studies are most often performed in the early phases of biomarker development. We show that the predictiveness curve can be represented as a function of the ROC curve and disease prevalence. Therefore, from a case-control study and known prevalence, we can estimate the monotone increasing predictiveness curve by first estimating a concave ROC curve. Semi-parametric approaches are utilized for estimating the ROC curve. We will also study a generalization of the predictiveness curve, which accommodates non-monotone risk functions. |
|
|
|
| February 22, 2006 | |
| Title: | Constructing complete networks from samples of local subnetwork data |
| Speaker: | David Lockhart, Biostatistics student |
| Abstract: | The risk of infection with many diseases, especially sexually transmitted diseases, is related to an individual’s position in a social network. Collecting data on the structure of complete networks is expensive, so most network research simply measures a sample of individuals (egos) and their relationships to others (alters) with little or no information on who else those alters have relationships with and how this local network is situated in the larger network context. I will lay out some of the basic concepts involved in analysis of network data and describe an exponential random graph model to construct complete networks from a sample of subnetworks to assess the variability in the structure of complete networks that are consistent with the observed data. I will present an example using sexual relationship data from rural Uganda and discuss how to handle complications that arise with such real world data due to inconsistencies in the reports of sexual behavior from men and women. |
|
|
|
| March 1, 2006 | |
| Title: | Proportional Hazard Model for Current Status Data when the Outcome is measured with Uncertainty |
| Speaker: | Giancarlo Sal y Rosas, Biostatistics student |
| Abstract: | In epidemiologic research it is frequently that the outcome of interest is measured with uncertainty due to imperfect sensitivity or specificity. I will study the Cox regression model for Current Status Data introduce by Huang J (1996). Current Status Data is a special case of interval censoring data where it is only know whether the failure time occurs before or after an observation time. I will use the EM algorithm in order to correct for the bias introduce by the uncertainty in the outcome in the regression coefficient and propose and estimation for the regression coefficient variance. We will present and example using the data of a randomized study in Seattle where the primary outcome was to be positive at follow-up for a disease that the patient was positive at baseline. We will present the challenges involve in this work such as convergence, computational time consuming. Suggestions to attack those problems will be more than welcome. |
|
|
|
| March 8, 2006 | |
| Title: | Exploring the genetics of time-varying Quantitative Trait from longitudinal data |
| Speaker: | Grace Ge, Biostatistics student |
| Abstract: | Complex trait is caused by different reasons and multiple genes. The complexity of genetic mechanisms behind these traits makes it hard to map the Quantitative Trait Loci (QTL), genes for those traits. One type of complex traits is the time-varying traits, such as Blood Pressure (BP), Glucose level, cholesterol, etc. Most data for these time-varying traits are family relationship among individuals with longitudinal data, repeated measures on the same individual over some follow-up time. However, the old methods to map QTL from these kinds of data are compressing the repeated measures to one summary statistics. One summary statistics in linkage analysis usually lose the power to detect QTL due to the loss of information. On the other hand, using only the longitudinal models to deal with the repeated measures but discarding the family information cannot map the QTL for the time-varying traits. Our new proposed model combines one of the traditional linkage analysis-Interval Mapping (IM) and longitudinal data (such as AR(1) covariance structure for repeated measures). In this way we can map the QTL by directly model the multiple observations on the same individual jointly. This talk will include results of a simple simulated study to show how the model performs, comparing the old methods with current popular software. |
|
|