Estimation of dynamic SNP-heritability with Bayesian Gaussian process models |
|
Author: | Arjas, Arttu1; Hauptmann, Andreas1,2; Sillanpää, Mikko J.1,3 |
Organizations: |
1Research Unit of Mathematical Sciences, University of Oulu, Oulu, FI-90014, Finland 2Department of Computer Science, University College London, London, UK 3Infotech Oulu, University of Oulu, Oulu, FI-90014, Finland |
Format: | article |
Version: | published version |
Access: | open |
Online Access: | PDF Full Text (PDF, 0.5 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2020042019365 |
Language: | English |
Published: |
Oxford University Press,
2020
|
Publish Date: | 2020-04-20 |
Description: |
AbstractMotivation: Improved DNA technology has made it practical to estimate single nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth and development related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. Results: We introduce a completely tuning-free Bayesian Gaussian process (GP) based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo (MCMC) method which allows full uncertainty quantification. Several data sets are analysed and our results clearly illustrate that the 95 % credible intervals of the proposed joint estimation method (which "borrows strength" from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 softwares and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. Availability: The C++ implementation dynBGP and simulated data are available in GitHub (https://github.com/aarjas/dynBGP). The programs can be run in R. Real datasets are available in QTL archive (https://phenome.jax.org/centers/QTLA). Conclusions: see all
|
Series: |
Bioinformatics |
ISSN: | 1367-4803 |
ISSN-E: | 1460-2059 |
ISSN-L: | 1367-4803 |
Volume: | 36 |
Issue: | 12 |
Pages: | 3795 - 3802 |
DOI: | 10.1093/bioinformatics/btaa199 |
OADOI: | https://oadoi.org/10.1093/bioinformatics/btaa199 |
Type of Publication: |
A1 Journal article – refereed |
Field of Science: |
111 Mathematics 112 Statistics and probability 1184 Genetics, developmental biology, physiology |
Subjects: | |
Funding: |
This work was partially supported by the Academy of Finland Profi 5 funding for mathematics and AI: data insight for high dimensional dynamics and the Academy of Finland (Project 312123, Finnish Centre of Excellence in Inverse Modelling and Imaging, 2018–2025). AH acknowledges support from EPSRC grants EP/N032055/1 and EP/M020533/1. |
Academy of Finland Grant Number: |
312123 |
Detailed Information: |
312123 (Academy of Finland Funding decision) |
Dataset Reference: |
Supplementary data are available at Bioinformatics online. |
Copyright information: |
© The Author(s) 2020. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
https://creativecommons.org/licenses/by/4.0/ |