University of Oulu

Kuismin, M., Ahlinder, J., Sillanpää, M. (2017) CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large Scale Sequencing Data. G3, 7 (10), 3359-3377. doi:10.1534/g3.117.300131

CONE : community oriented network estimation is a versatile framework for inferring population structure in large-scale sequencing data

Saved in:
Author: Kuismin, Markku O.1; Ahlinder, Jon2; Sillanpӓӓ, Mikko J.1
Organizations: 1Department of Mathematical Sciences, University of Oulu
2Swedish Defense Research Agency
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3.4 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2017120455245
Language: English
Published: Genetics Society of America, 2017
Publish Date: 2017-12-04
Description:

Abstract

Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization), (ii) principal component analysis (dimension reduction together with simple visual inspection), and (iii) model-based methods (e.g., ancestry coefficient estimates). We have named our process CONE (for community oriented network estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis and the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations than model-based methods, and provided comparable ancestry coefficient estimates. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and the human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings.

see all

Series: G3. Genes, genomes, genetics
ISSN: 2160-1836
ISSN-E: 2160-1836
ISSN-L: 2160-1836
Volume: 7
Issue: 10
Pages: 3359 - 3377
DOI: 10.1534/g3.117.300131
OADOI: https://oadoi.org/10.1534/g3.117.300131
Type of Publication: A1 Journal article – refereed
Field of Science: 1184 Genetics, developmental biology, physiology
Subjects:
Funding: This work was funded by the University of Oulu’s Technology and Natural Sciences Doctoral Programme (TNS-DP) and by the Swedish Ministry of Foreign Affairs, project A4952.
Dataset Reference: Supplemental material is available online at:
  www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.300131/-/DC1
Copyright information: Copyright © 2017 Kuismin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.