| October 5, 2005 | |
| Title: | The Case-Cohort Design |
| Speaker: | Joanna Scott, Biostatistics graduate student |
| Abstract: | In 1986, Ross Prentice published the case-cohort design, an innovative study design which uses a sub-sampling technique for survival studies. This type of study was designed to allow efficient analysis of studies where the population size was too large to collect data on all the participants. Data is collected on a subcohort of participants that were randomly sampled from the original cohort. This subcohort would be augmented with all participants who experienced an event and would form the sample used for analysis. This talk will discuss the development of this study design, the parameter and variance estimates, and the usefulness of this design with respect to HIV vaccine trials. |
|
|
|
| October 12, 2005 | |
| Title: | Disease mapping and spatial regression for count data |
| Speaker: | Jon Wakefield, Ph.D., Department of Biostatistics, UW |
| Abstract: | Disease mapping and spatial regression are common endeavors in the general area of spatial epidemiology. In this talk I will discuss the problems with such pursuits when the data are available as counts aggregated counts across spatial regions. Particular attention will be devoted to the ecological fallacy. Bayesian hierarchical models are convenient for modeling spatial dependence though prior choice requires care. |
|
|
|
| October 19, 2005 | |
| Title: | Slanted with zeroes implanted: methods for analyzing skewed data with zero values |
| Speaker: | Yea-Hung Chen, Biostatistics graduate student |
| Abstract: | Public health research often gives rise to data that are positively skewed. In many naturally occurring situations, the values can be said to follow a lognormal distribution. Common examples include medical costs and air pollution measurements. For this talk, I will address a slight variation of this model in which the data also include zero values (Zhou and Tu, 2000). Methods for evaluating such data are not well-studied, particularly for the two-sample situation in which the scientist is interested in comparing the population parameters. In this talk, I will review two methods for estimating the ratio of the two population means. I will also propose two new approaches: an application of the ‘generalized pivotal’ method (Weerahandi, 1993) and an application of a signed log-likelihood ratio method (Barndorff-Nielsen, 1986). |
|
|
|
| October 26, 2005 | |
| Title: | A frailty analysis of infant mortality in siblings |
| Speaker: | Tron Moger, Biostatistics postdoctoral fellow |
| Abstract: | Frailty models are frequently used when analysing survival data on families, where the survival times within families are dependent. The frailty variable models unobserved heterogeneity in risk, and by including some covariates in the model, one should (hopefully) be able to explain some of the dependence within families. Distributions included in the power variance function (PVF) family, such as the gamma, inverse gaussian and stable distributions, are commonly used in frailty models. I develop models where a scale parameter in the PVF-family is random. This yields a two-level model with heterogeneity on both the individual and family level. The new frailty model is applied to data from the Medical Birth Registry of Norway, which includes information on survival within the first year of life for all infants born in Norway from 1967 to 2001 (2.2 million births), and covariates such as birth weight, age of the mother at birth, birth year of child etc. An important question is where to put the covariates in the model, as there are several options. Some covariates are shared by all infants in a sibship, some are individual, and most are somewhere in-between. |
|
|
|
| November 2, 2005 | |
| Title: | Simple estimates of haplotype relative risks in case-control data |
| Speaker: | Ben French, Biostatistics Graduate Student |
| Abstract: | Haplotypes imputed from measured genotypes are a popular way of coding genetic information. Methods of varying complexity have been proposed to estimate haplotype relative risks while incorporating phase uncertainty. Our goal is to estimate effects of common haplotypes in large case-control studies such that haplotype imputation is done once as a simple data-processing step. We performed a simulation study based on haplotype frequency data from five genes in the renin-angiotensin system: renin, angiotensinogen, angiotensin converting enzyme, type 1 and 2 angiotensin receptors. All of the methods we compared involve fitting a weighted logistic regression model, but differ in how the weights are specified. Non-iterative methods included using the most likely haplotype pair and including all possible pairs with probability weights. Iterative methods included using the misspecified cohort likelihood and fitting an iteratively reweighted logistic regression. We simulated several genetic models: no effects, weak effects, strong effects tagged and not tagged by single SNPs, and interaction with a binary environmental covariate. We also quantified the amount of phase ambiguity in the simulated genes. We expected that the methods would perform equally well when the genetic effects are small and when SNPs represent the genetic model. Type 2 angiotensin receptor was the only gene for which there was uncertainty in the estimated haplotypes. For this gene, all methods performed well under no effects, weak effects, and strong effects tagged by a single SNP. The non-iterative methods produced biased estimates under strong effects not tagged by a single SNP. Results were similar under interaction with a binary covariate. Non-iterative weighted logistic regression gives valid tests for genetic associations and reliable estimates of modest genetic effects of common haplotypes. The potential for phase ambiguity does not necessarily imply uncertainty in estimated haplotypes, especially in large studies of common haplotypes. |
|
|
|
| November 9, 2005 | |
| Title: | Relaxed Significance Criteria for Linkage Analysis |
| Speaker: | Lin Chen, Biostatistics Graduate Student |
| Abstract: | Linkage analysis involves performing significance tests at many loci located throughout the genome. Traditional criteria for declaring a linkage statistically significant have been formulated with the goal of controlling the rate at which any single false positive occurs, called the genome-wide error rate (GWER). As complex traits have become the focus of linkage analysis, it is increasingly common to expect that a number of loci are truly linked to the trait. This is especially true in mapping quantitative trait loci (QTL), where sometimes dozens of QTLs exist. Therefore, alternatives to the strict goal of preventing any single false positive have recently been explored, such as the false discovery rate (FDR) criterion. Here, we characterize some of the challenges that arise when defining relaxed significance criteria that allow for at least one false positive linkage to occur. In particular, we show that the FDR suffers from several problems when applied to linkage analysis. We therefore conclude that the applicability of FDR for declaring significant linkages is dubious. Instead, we propose a significance criterion that is more relaxed than the traditional GWER, but does not appear to suffer from the problems of the FDR. A generalized version of the GWER is proposed, called GWERk, that allows one to provide a more liberal balance between true positives and false positives at no additional cost in computation or assumptions. |
|
|
|
| November 16, 2005 | |
| Title: | An Investigation of the Variance Components Bivariate Linkage Method in a Simulation Study |
| Speaker: | Angel Wan, Biostatistics Graduate Student |
| Abstract: | Traditionally, researchers implement a variance components linkage analysis for each single quantitative trait of interest. However, a generalization to a bivariate trait analysis may prove to be more powerful under certain conditions. A particular motivation for multivariate variance components analyses is to account for pleiotropy, in which a single quantitative trait locus (QTL) influences two or more traits. Here, we examine the power, type I error, and parameter estimation under (1) complete pleiotropy (2) incomplete pleiotropy confounded by another closely linked QTL (3) incomplete pleiotropy confounded by an unlinked QTL. In addition, the power and type I error are examined for different family structures (three-generational; nuclear; mixture of sibling pairs, nuclear, and three-generational families) and sample sizes (1000, 700, 400 people) under a particularly powerful case of (2) and (3). The basic questions addressed in this talk are the following: 1. Under what conditions and parameter configurations are the univariate analyses equally or more powerful than the bivariate analyses, and vice versa? 2. In the bivariate linkage analyses, what are the distributions of the parameter estimates; how do these estimates compare to the truth? 3. What kind of family structure and sample sizes will provide adequate power? |
|
|
|
| November 30, 2005 | |
| Title: | Separating Contrasts and Borrowing of Information |
| Speaker: | Kyle Rudser, Biostatistics Graduate Student |
| Abstract: | Two aspects of statistical inference involve the idea of “borrowing information” across sampling units and forming a contrast to quantify the effect of a given covariate of interest. While these two concepts are essentially the same in the classical regression setting, there isn’t any particular reason why they can’t be separated. In particular, for survival analyses, borrowing information is complicated due to censoring of the time to event. One option for separating a sample into groups to borrow information across is to partition the sample recursively, as is done in regression trees. Contrasts for different functionals of survival are examined using groups defined from partitioning on the basis of similar survival distributions evaluated using a weighted logrank statistic. |
|
|
|
| December 7, 2005 | |
| Title: | Evaluation of Existing Software for Imputation of Missing Data |
| Speaker: | Eric Johnson, Biostatistics Graduate Student |
| Abstract: | Several commonly used statistical software packages allow multiple imputation (MI) of missing data using a variety of differing methods. Simulation studies in the literature have found that imputations based on predictive mean matching or multivariate normality are somewhat robust to departures from their underlying assumptions. However, Stata implements MI via a series of chained equations, and it has not been reviewed in the aforementioned simulation studies. Our work focuses on comparing the different methods on panel data from the National Alzheimer’s Coordinating Center, with simulation studies to verify differences in the approaches. |