Kuismin, M., Ahlinder, J., Sillanpää, M. (2017) CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large Scale Sequencing Data. G3, 7 (10), 3359-3377. doi:10.1534/g3.117.300131
CONE : community oriented network estimation is a versatile framework for inferring population structure in large-scale sequencing data
|Author:||Kuismin, Markku O.1; Ahlinder, Jon2; Sillanpӓӓ, Mikko J.1|
1Department of Mathematical Sciences, University of Oulu
2Swedish Defense Research Agency
|Online Access:||PDF Full Text (PDF, 3.4 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2017120455245
Genetics Society of America,
|Publish Date:|| 2017-12-04
Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization), (ii) principal component analysis (dimension reduction together with simple visual inspection), and (iii) model-based methods (e.g., ancestry coefficient estimates). We have named our process CONE (for community oriented network estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis and the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations than model-based methods, and provided comparable ancestry coefficient estimates. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and the human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings.
G3 : genes, genomes, genetics
|Pages:||3359 - 3377|
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
1184 Genetics, developmental biology, physiology
This work was funded by the University of Oulu’s Technology and Natural Sciences Doctoral Programme (TNS-DP) and by the Swedish Ministry of Foreign Affairs, project A4952.
Supplemental material is available online at:
Copyright © 2017 Kuismin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.