University of Oulu

Behravan, H., Hartikainen, J., Tengström, M., Pylkäs, K., Winqvist, R., Kosma, V., Mannermaa, A. (2018) Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls. Scientific Reports, 8, 13149. doi:10.1038/s41598-018-31573-5

Machine learning identifies interacting genetic variants contributing to breast cancer risk : a case study in Finnish cases and controls

Saved in:
Author: Behravan, Hamid1; Hartikainen, Jaana M.1; Tengström, Maria2,3;
Organizations: 1Institute of Clinical Medicine, Pathology and Forensic Medicine, and Translational Cancer Research Area, University of Eastern Finland
2Institute of Clinical Medicine, Oncology, University of Eastern Finland
3Cancer Center, Kuopio University Hospital
4Laboratory of Cancer Genetics and Tumor Biology, Cancer and Translational Medicine Research Unit and Biocenter Oulu, Northern Finland Laboratory Centre Nordlab Oulu, University and University Hospital of Oulu
5Biobank of Eastern Finland and Central Administration, Kuopio University Hospital
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.7 MB)
Persistent link:
Language: English
Published: Springer Nature, 2018
Publish Date: 2019-02-13


We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER−) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis.

see all

Series: Scientific reports
ISSN: 2045-2322
ISSN-E: 2045-2322
ISSN-L: 2045-2322
Volume: 8
Article number: 13149
DOI: 10.1038/s41598-018-31573-5
Type of Publication: A1 Journal article – refereed
Field of Science: 3111 Biomedicine
3121 General medicine, internal medicine and other clinical medicine
Funding: This study was financially supported by the special Government Funding (EVO) of Kuopio University Hospital grants, Cancer Fund of North Savo, the Finnish Cancer Organizations, and by the strategic funding of the University of Eastern Finland.
Copyright information: © The Author(s) 2018. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit