Volume 15 Number 6 (Nov. 2020)
Home > Archive > 2020 > Volume 15 Number 6 (Nov. 2020) >
JSW 2020 Vol.15(6): 147-162 ISSN: 1796-217X
doi: 10.17706/jsw.15.6.147-162

A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction

Rayhanul Islam1*, Abdus Satter2, Atish Kumar Dipongkor3, Md. Saeed Siddik4, Kazi Sakib5

1Institute of Leather Engineering and Technology, University of Dhaka, Dhaka, Bangladesh.
2Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh.
3Jashore University of Science and Technology, Jashore, Bangladesh.

Abstract—Software defect prediction model is trained using code metrics and historical defect information to identify probable software defects. The accuracy and performance of a prediction model largely depend on the training dataset. In order to provide proper training dataset, it is required to make the dataset clustered with less variabilities using clustering algorithms. However, clustering process is hampered due to multiple attributes of dataset such as Coupling between Objects, Response for Class, Lines of Code, etc. This research will aim to predict software defects through reducing code metrics dimensions to two latent variables. It will finally help the clustering algorithms to group data properly for the defect prediction model. In this paper, the dataset similarities are analyzed by reducing code metrics’ attributes into two latent variables based on their impacts to defects. Their impacts to defects can be analyzed using regression analysis because it identifies the relationship among a set of dependent and independent variables. Then, the code metrics are merged into two variables - PosImpactValue and NegImpactValue based on their positive or negative impact, respectively. As a result, multi-dimensional dataset is mapped into two-dimensional dataset. Plotting those dimensions reduced datasets enable distance-based clustering algorithms to group those datasets based on their similarities. Experiments have been performed on 18 releases of 6 open source software datasets such as jEdit, Ant, Xalan, Synapse, Tomcat and Camel. For comparative analysis, one of the most commonly used dimension reduction techniques named Principle Component Analysis (PCA) and two popular clustering techniques in defect prediction – DBSCAN and WHERE have been used in the experiment. First, the dimensions of the experimental datasets have been reduced using the proposed technique and PCA separately. Then, the reduced datasets have been clustered using DBSCAN and WHERE independently for identifying number of defects accurately. The comparative result analysis shows that the defect prediction models based on the clustering algorithms are more accurate for the dataset reduced by the proposed technique than PCA.

Index Terms—Software defect prediction, principal component analysis, DBSCAN, WHERE clustering, code metrics’ dimension reduction technique, dataset pre-processing.

[PDF]

Cite: Rayhanul Islam, Abdus Satter, Atish Kumar Dipongkor, Md. Saeed Siddik, Kazi Sakib, "A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction," Journal of Software vol. 15, no. 6, pp. 147-162, 2020.

Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Apr 26, 2021 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Nov 18, 2021 News!

    Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP   [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]