University of Oulu

Lomio, F., Moreschini, S. & Lenarduzzi, V. A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction. Empir Software Eng 27, 189 (2022).

A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction

Saved in:
Author: Lomio, Francesco1; Moreschini, Sergio1; Lenarduzzi, Valentina2
Organizations: 1Tampere University, Tampere, Finland
2University of Oulu, Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 4.3 MB)
Persistent link:
Language: English
Published: Springer Nature, 2022
Publish Date: 2023-06-06


Background: Developers spend more time fixing bugs refactoring the code to increase the maintainability than developing new features. Researchers investigated the code quality impact on fault-proneness, focusing on code smells and code metrics.

Objective: We aim at advancing fault-inducing commit prediction using different variables, such as SonarQube rules, product, process metrics, and adopting different techniques.

Methods: We designed and conducted an empirical study among 29 Java projects analyzed with SonarQube and SZZ algorithm to identify fault-inducing and fault-fixing commits, computing different product and process metrics. Moreover, we investigated fault-proneness using different Machine and Deep Learning models.

Results: We analyzed 58,125 commits containing 33,865 faults and infected by more than 174 SonarQube rules violated 1.8M times, on which 48 software product and process metrics were calculated. Results clearly identified a set of features that provided a highly accurate fault prediction (more than 95% AUC). Regarding the performance of the classifiers, Deep Learning provided a higher accuracy compared with Machine Learning models.

Conclusions: Future works might investigate whether other static analysis tools, such as FindBugs or Checkstyle, can provide similar or different results. Moreover, researchers might consider the adoption of time series analysis and anomaly detection techniques.

see all

Series: Empirical software engineering
ISSN: 1382-3256
ISSN-E: 1573-7616
ISSN-L: 1382-3256
Volume: 27
Issue: 7
Article number: 189
DOI: 10.1007/s10664-022-10164-z
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Copyright information: © The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit