Arttu Arjas, Andreas Hauptmann, Mikko J Sillanpää, Estimation of dynamic SNP-heritability with Bayesian Gaussian process models, Bioinformatics, , btaa199, https://doi.org/10.1093/bioinformatics/btaa199
Estimation of dynamic SNP-heritability with Bayesian Gaussian process models
|Author:||Arjas, Arttu1; Hauptmann, Andreas1,2; Sillanpää, Mikko J.1,3|
1Research Unit of Mathematical Sciences, University of Oulu, Oulu, FI-90014, Finland
2Department of Computer Science, University College London, London, UK
3Infotech Oulu, University of Oulu, Oulu, FI-90014, Finland
|Online Access:||PDF Full Text (PDF, 0.4 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2020042019365
Oxford University Press,
|Publish Date:|| 2020-04-20
Motivation: Improved DNA technology has made it practical to estimate single nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth and development related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty.
Results: We introduce a completely tuning-free Bayesian Gaussian process (GP) based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo (MCMC) method which allows full uncertainty quantification. Several data sets are analysed and our results clearly illustrate that the 95 % credible intervals of the proposed joint estimation method (which "borrows strength" from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 softwares and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals.
Availability: The C++ implementation dynBGP and simulated data are available in GitHub (https://github.com/aarjas/dynBGP). The programs can be run in R. Real datasets are available in QTL archive (https://phenome.jax.org/centers/QTLA).
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
112 Statistics and probability
1184 Genetics, developmental biology, physiology
This work was partially supported by the Academy of Finland Profi 5 funding for mathematics and AI: data insight for high dimensional dynamics and the Academy of Finland (Project 312123, Finnish Centre of Excellence in Inverse Modelling and Imaging, 2018–2025). AH acknowledges support from EPSRC grants EP/N032055/1 and EP/M020533/1.
|Academy of Finland Grant Number:||
312123 (Academy of Finland Funding decision)
Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.