University of Oulu

Kuismin, M., Dodangeh, F., & Sillanpää, M. J. (2022). Gap-com: General model selection criterion for sparse undirected gene networks with nontrivial community structure. G3 Genes|Genomes|Genetics, 12(2), jkab437.

Gap-com : general model selection criterion for sparse undirected gene networks with nontrivial community structure

Saved in:
Author: Kuismin, Markku1,2,3; Dodangeh, Fatemeh1; Sillanpää, Mikko J.1,2,4
Organizations: 1Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
2Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
3School of Computing, University of Eastern Finland, Joensuu FI-80101, Finland
4Infotech Oulu, University of Oulu, Oulu FI-90014, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 0.6 MB)
Persistent link:
Language: English
Published: Genetics Society of America, 2021
Publish Date: 2022-08-22


We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.

see all

Series: G3. Genes, genomes, genetics
ISSN: 2160-1836
ISSN-E: 2160-1836
ISSN-L: 2160-1836
Volume: 12
Issue: 2
Article number: jkab437
DOI: 10.1093/g3journal/jkab437
Type of Publication: A1 Journal article – refereed
Field of Science: 1184 Genetics, developmental biology, physiology
Funding: This work was supported by the Biocenter Oulu funding, Jane ja Aatos Erkon Säätiö and the Technology Industries of Finland Centennial Foundation as well as the Academy of Finland R’Life program funding (grant 329439) and the Academy of Finland Profi5/HiDyn funding for mathematics and AI: data insight for high-dimensional dynamics (grant 326291).
Academy of Finland Grant Number: 329439
Detailed Information: 329439 (Academy of Finland Funding decision)
Dataset Reference: A demo script and R codes for reproducing all the analyses and figures represented in this study are available at GitHub under the GPL license, The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables. Supplemental material is available at G3 online.
Copyright information: © The Author(s) 2021. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.