Volume 14 Number 4 (Apr. 2019)
Home > Archive > 2019 > Volume 14 Number 4 (Apr. 2019) >
JSW 2019 Vol.14(4): 168-181 ISSN: 1796-217X
doi: 10.17706/jsw.14.4.168-181

How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?

Chao Luo1, Xiaojie Li1*, Jing Yin2, Jia He1,Deng Gao1, Jiliu Zhou1
1 Chengdu University of Information Technology, Chengdu, China.
2 Chongqing UniversityofTechnology, Chongqing, China

Abstract— Convolution neural network(CNN) has been widely applied in many fields and achievedexcellent results, especially in image classification tasks. As we all know, many factors affect theperformance of image classification. In particular, the size of training data sets and the number ofcategories are important factors affecting performance. While for most people, a large number oftraining data set are difficult to obtain or need to do a classification task with a large number ofcategories. Thus, we consider two questions of this approach: How does the size of the data setaffect performance? How does the number of categories affect performance? In order to figure outthese two questions, we constructed two types of experiment: Experiment 1, changing the number ofcategories and exploring how the number of categories affectsperformance in image classificationtask. There are 7 groups experiment performed by increasing the number of categories and performed5 times experiment in each group (35 times experiment in total). Observe the change in accuracy toanalyze the impact of the number of categories on performance. Experiment 2, changing data set sizeand exploring how the data set size affect performance. For each k-classification experiment, we do 5groups by increasing the size of the training set. There are 35 groups experiment performed 5 timesexperiment ineach group (175 times experiment in total). Observe changes in accuracy to analyzethe effect of data set size on performance. For the CNN-based network, the results of experiment1 show that the more categories, the worse the performance, and the less categories, the better theperformance. In addition, when the number of categories to be classified is large, sometimes betteraccuracy can be obtained. The results of experiment 2 show that the larger the training set, thehigherthe test accuracy. When thetraining data set are insufficient, better results can be obtained. Therefore,in classification experiment, when the data set size is small or the number of categories is large, wecan do more experiments and retain thebest results. Results of this paper not only can guide usto do experiments on image classification, but also have important guiding significance for otherexperiments based on deep learning.

Index Terms— Multi-classification; CNN; ResNet


Cite: Chao Luo,Xiaojie Li,Jing Yin,Jia He,Deng gao,Jiliu Zhou, "How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?," Journal of Software vol. 14, no. 4, pp. 168-181, 2019.

General Information

ISSN: 1796-217X (Online)
Frequency: Monthly (2006-2019); Bimonthly (Since 2020)
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, Google Scholar, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
  • Dec 06, 2019 News!

    Vol 14, No 1- Vol 14, No 4 has been indexed by EI (Inspec)   [Click]

  • Jun 22, 2020 News!

    Papers published in JSW Vol 14, No 1- Vol 15 No 4 have been indexed by DBLP     [Click]

  • Jun 22, 2020 News!

    The papers published in Vol 15, No 5 have all received dois from Crossref    [Click]

  • Aug 01, 2018 News!

    [CFP] 2020 the annual meeting of JSW Editorial Board, ICCSM 2020, will be held in Rome, Italy, July 17-19, 2020   [Click]

  • Jun 22, 2020 News!

    Vol 15, No 5 has been published with online version     [Click]