Volume 9 Number 3 (Mar. 2014)
Home > Archive > 2014 > Volume 9 Number 3 (Mar. 2014) >
JSW 2014 Vol.9(3): 697-704 ISSN: 1796-217X
doi: 10.4304/jsw.9.3.697-704

An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets

Renqing Li, Shihai Wang
School of Reliability and System Engineering, Beihang University, Beijing, China; Science and Technology on Reliability and Environmental Engineering Laboratory

Abstract—Software faults could cause serious system errors and failures, leading to huge economic losses. But currently none of inspection and verification technique is able to find and eliminate all software faults. Software testing is an important way to inspect these faults and raise software reliability, but obviously it is a really expensive job. The estimation of a module’s fault-proneness is important to minimize the software testing resources required by guiding the resource allocation on the high-risk modules. Consequently the efficiency of software testing and the reliability of the software are improved. The software faults data sets, however, originally have the imbalanced distribution. A small amount of software modules holds most faults, while the most of modules are fault-free. Such imbalanced data distribution is really a challenge for the researchers in the field of prediction for software faultproneness. In this paper, we make an investigation on software fault-prone prediction models by employing C4.5, SVM, KNN, Logistic, NaiveBayes, AdaBoost and SMOTEBoost based on software metrics. We perform an empirical study on the effectiveness of these models on imbalanced software fault data sets obtained from NASA’s MDP. After a comprehensive comparison based on the experiment results, the SMOTEBoost reveals the outstanding performances than the other models on predicting the high-risk software modules with higher recall and AUC values, which demonstrates the model based on SMOTEBoost has a better ability to estimate a module’s fault-proneness and furthermore improve the efficiency of software testing.

Index Terms—SMOTEBoost, software fault-prone, prediction model, imbalanced data sets


Cite: Renqing Li, Shihai Wang, "An Empirical Study for Software Fault-Proneness Prediction with Ensemble Learning Models on Imbalanced Data Sets," Journal of Software vol. 9, no. 3, pp. 697-704, 2014.

General Information

ISSN: 1796-217X
Frequency: Monthly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, CNKI,etc
E-mail: jsw@iap.org
  • Nov 29, 2018 News!

    Papers published in JSW Vol. 13, No. 1- Vol. 13 No. 10 have been indexed by DBLP.    [Click]

  • Aug 24, 2018 News!

    Vol.12, No.8- Vol.13, No.5 has been indexed by EI (Inspec).   [Click]

  • Aug 01, 2018 News!

    [CFP] 2019 the annual meeting of JSW Editorial Board, ICCSM 2019, will be held in Barcelona, Spain, July 14-16, 2019.   [Click]

  • Nov 08, 2018 News!

    The papers published in Vol.13, No. 10 have all received dois from Crossref.

  • Nov 29, 2018 News!

    Vol 13, No. 12 has been published with online version 4 original aritcles from 3 countries are published in this issue.     [Click]