This week will feature a panel discussion and question/answer session about the sometimes intimidating process of finding a thesis adviser/topic. The event will be students-only, and snacks will be provided; don’t miss out!
This week will feature a panel discussion and question/answer session about the sometimes intimidating process of finding a thesis adviser/topic. The event will be students-only, and snacks will be provided; don’t miss out!
Matthew Bryan
The study of genetic association is a rapidly growing area of statistical research. Due to the massive amounts of data generated by these studies, many problems arise in analyzing such data such as multiple comparisons, computational issues, and missing data. This presentation will discuss portions of an ongoing research project on missing data in genetic association studies. The goal of the project is to study genetic association methods, understand how these methods account for missing data, and assess whether these methods can be improved by further adjustment for missing data. This presentation will focus on a method suggested by Timothy Thornton and Mary McPeek that uses a Quasi-Likelihood Score Test approach. The method incorporates information from subjects that are missing phenotype and genotype information. Further discussion will look into extending these methods to better account for missingness through multiple imputation.
Veronika Skrivankova
Quantile regression is an important tool for estimation of conditional quantiles of a response Y given a vector of covariates X. Extremal quantile regression deals with a problem of sparsity of data in tails of the response variable and attempts to estimate extreme conditional quantiles by employing the extreme value theory.
Under the assumption that a distribution function F is in the maximum domain of attraction (MDA) of some extreme value distribution (EVD), the extremal domain condition can be derived. It identifies which of the three types of EVD; Fréchet, Gumbel or Weibull, contains F in their MDA. The “Peak-Over-Threshold” method employs the generalized Pareto distribution to approximate the tail of the distribution function F. All presented methods for determination of the tail heaviness and estimators for the extreme value index are illustrated on real data.
Empirical conditional quantiles are represented by regression quantile lines. Since each conditional quantile function is nondecreasing, the regression quantile lines are not supposed to cross. Under the assumption that our data follow the heteroscedastic model, we follow a procedure which not only yields non-crossing regression quantile lines, but also enables us to estimate the extreme regression quantiles.
Fred Boehm
Genome-wide association studies hold great promise in the identification of genetic variants that underlie individuals’ susceptibilities to many complex diseases, including diabetes, coronary heart disease, and lung cancer. However, population structure – systematic differences in allele frequencies among people with different ancestries – has the potential to confound findings in case-control genome-wide association studies. I will discuss current approaches to identification of population structure using genome-wide SNP data and strategies to account for population structure when performing association tests. I will illustrate these ideas with examples from the International HapMap Project.