University of Oulu

Arttu Arjas, Andreas Hauptmann, Mikko J Sillanpää, Estimation of dynamic SNP-heritability with Bayesian Gaussian process models, Bioinformatics, , btaa199, https://doi.org/10.1093/bioinformatics/btaa199

Estimation of dynamic SNP-heritability with Bayesian Gaussian process models

Saved in:
Author: Arjas, Arttu1; Hauptmann, Andreas1,2; Sillanpää, Mikko J.1,3
Organizations: 1Research Unit of Mathematical Sciences, University of Oulu, Oulu, FI-90014, Finland
2Department of Computer Science, University College London, London, UK
3Infotech Oulu, University of Oulu, Oulu, FI-90014, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.4 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2020042019365
Language: English
Published: Oxford University Press, 2020
Publish Date: 2020-04-20
Description:

Abstract

Motivation: Improved DNA technology has made it practical to estimate single nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth and development related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty.

Results: We introduce a completely tuning-free Bayesian Gaussian process (GP) based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo (MCMC) method which allows full uncertainty quantification. Several data sets are analysed and our results clearly illustrate that the 95 % credible intervals of the proposed joint estimation method (which "borrows strength" from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 softwares and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals.

Availability: The C++ implementation dynBGP and simulated data are available in GitHub (https://github.com/aarjas/dynBGP). The programs can be run in R. Real datasets are available in QTL archive (https://phenome.jax.org/centers/QTLA).

Conclusions:

see all

Series: Bioinformatics
ISSN: 1367-4803
ISSN-E: 1460-2059
ISSN-L: 1367-4803
Volume: Accepted
DOI: 10.1093/bioinformatics/btaa199
OADOI: https://oadoi.org/10.1093/bioinformatics/btaa199
Type of Publication: A1 Journal article – refereed
Field of Science: 111 Mathematics
112 Statistics and probability
1184 Genetics, developmental biology, physiology
Subjects:
Funding: This work was partially supported by the Academy of Finland Profi 5 funding for mathematics and AI: data insight for high dimensional dynamics and the Academy of Finland (Project 312123, Finnish Centre of Excellence in Inverse Modelling and Imaging, 2018–2025). AH acknowledges support from EPSRC grants EP/N032055/1 and EP/M020533/1.
Academy of Finland Grant Number: 312123
Detailed Information: 312123 (Academy of Finland Funding decision)
Dataset Reference: Supplementary data are available at Bioinformatics online.
Copyright information: © The Author(s) 2020. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
  https://creativecommons.org/licenses/by/4.0/