University of Oulu

Davoud Davoudi Moghaddam, Omid Rahmati, Mahdi Panahi, John Tiefenbacher, Hamid Darabi, Ali Haghizadeh, Ali Torabi Haghighi, Omid Asadi Nalivan, Dieu Tien Bui, The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers, CATENA, Volume 187, 2020, 104421, ISSN 0341-8162,

The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers

Saved in:
Author: Moghaddam, Davoud Davoudi1; Rahmati, Omid2; Panahi, Mahdi3,4;
Organizations: 1Department of Watershed Management, Agriculture and Natural Resources Faculty, Lorestan University, Iran
2Soil Conservation and Watershed Management Research Department, Kurdistan Agricultural and Natural Resources Research and Education Center, AREEO, Sanandaj, Iran
3Division of Science Education, Kangwon National University, Chuncheon-si, Gangwon-do 24341, Republic of Korea
4Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124, Gwahak-ro, Yuseong-gu, Daejeon 34132, Republic of Korea
5Department of Geography, Texas State University, San Marcos, TX 78666, USA
6Water, Energy and Environmental Engineering Research Unit, University of Oulu, Oulu, Finland
7Department of Watershed Management, Natural Resources Faculty, Gorgan University of Agricultural Sciences and Natural Resources, Gorgan, Iran
8Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 4.6 MB)
Persistent link:
Language: English
Published: Elsevier, 2020
Publish Date: 2021-01-23


Machine learning models have attracted much research attention for groundwater potential mapping. However, the accuracy of models for groundwater potential mapping is significantly influenced by sample size and this is still a challenge. This study evaluates the influence of sample size on the accuracy of different individual and hybrid models, adaptive neuro-fuzzy inference system (ANFIS), ANFIS-imperial competitive algorithm (ANFIS-ICA), alternating decision tree (ADT), and random forest (RF) to model groundwater potential, considering the number of springs from 177 to 714. A well-documented inventory of springs, as a natural representative of groundwater potential, was used to designate four sample data sets: 100% (D₁), 75% (D₂), 50% (D₃), and 25% (D₄) of the entire springs inventory. Each data set was randomly split into two groups of 30% (for training) and 70% (for validation). Fifteen diverse geo-environmental factors were employed as independent variables. The area under the operating receiver characteristic curve (AUROC) and the true skill statistic (TSS) as two cutoff-independent and cutoff-dependent performance metrics were used to assess the performance of models. Results showed that the sample size influenced the performance of four machine learning algorithms, but RF had a lower sensitivity to the reduction of sample size. In addition, validation results revealed that RF (AUROC = 90.74–96.32%, TSS = 0.79–0.85) had the best performance based on all four sample data sets, followed by ANFIS-ICA (AUROC = 81.23–91.55%, TSS = 0.74–0.81), ADT (AUROC = 79.29–88.46%, TSS = 0.59–0.74), and ANFIS (AUROC = 73.11–88.43%, TSS = 0.59–0.74). Further, the relative slope position, lithology, and distance from faults were the main spring-affecting factors contributing to groundwater potential modelling. This study can provide useful guidelines and a valuable reference for selecting machine learning models when a complete spring inventory in a watershed is unavailable.

see all

Series: Catena
ISSN: 0341-8162
ISSN-E: 1872-6887
ISSN-L: 0341-8162
Volume: 187
Article number: 104421
DOI: 10.1016/j.catena.2019.104421
Type of Publication: A1 Journal article – refereed
Field of Science: 1171 Geosciences
Copyright information: © 2019 The Authors. Published by Elsevier Inc. This manuscript version is made available under the CC-BY-NC-ND 4.0 license