Kuismin, M., Dodangeh, F., & Sillanpää, M. J. (2022). Gap-com: General model selection criterion for sparse undirected gene networks with nontrivial community structure. G3 Genes|Genomes|Genetics, 12(2), jkab437. https://doi.org/10.1093/g3journal/jkab437
Gap-com : general model selection criterion for sparse undirected gene networks with nontrivial community structure
|Author:||Kuismin, Markku1,2,3; Dodangeh, Fatemeh1; Sillanpää, Mikko J.1,2,4|
1Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
2Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
3School of Computing, University of Eastern Finland, Joensuu FI-80101, Finland
4Infotech Oulu, University of Oulu, Oulu FI-90014, Finland
|Online Access:||PDF Full Text (PDF, 0.6 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2022051635727
Genetics Society of America,
|Publish Date:|| 2022-08-22
We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.
G3. Genes, genomes, genetics
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
1184 Genetics, developmental biology, physiology
This work was supported by the Biocenter Oulu funding, Jane ja Aatos Erkon Säätiö and the Technology Industries of Finland Centennial Foundation as well as the Academy of Finland R’Life program funding (grant 329439) and the Academy of Finland Profi5/HiDyn funding for mathematics and AI: data insight for high-dimensional dynamics (grant 326291).
|Academy of Finland Grant Number:||
329439 (Academy of Finland Funding decision)
A demo script and R codes for reproducing all the analyses and figures represented in this study are available at GitHub under the GPL license, https://github.com/markkukuismin/gap-com. The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables. Supplemental material is available at G3 online.
© The Author(s) 2021. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.