University of Oulu

S. Hosseini, B. Turhan and D. Gunarathna, "A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction," in IEEE Transactions on Software Engineering, vol. 45, no. 2, pp. 111-147, 1 Feb. 2019. doi: 10.1109/TSE.2017.2770124

A systematic literature review and meta-analysis on cross project defect prediction

Saved in:
Author: Hosseini, Seyedrebvar1; Turhan, Burak2; Gunarathna, Dimuthu3
Organizations: 1University of Oulu, Oulu, Finland
2Department of Computer Science, Brunel University London, London, United Kingdom
3Vaimo Finland (Oy), Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.9 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2019
Publish Date: 2019-09-23


Background: Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence.

Objective: To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP versus within project DP models.

Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions.

Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naïve Bayes yields average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP versus WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

see all

Series: IEEE transactions on software engineering
ISSN: 0098-5589
ISSN-E: 1939-3520
ISSN-L: 0098-5589
Volume: 45
Issue: 2
Pages: 111 - 147
DOI: 10.1109/TSE.2017.2770124
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Copyright information: © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.