Volume 14 Number 4 (Apr. 2019)
Home > Archive > 2019 > Volume 14 Number 4 (Apr. 2019) >
JSW 2019 Vol.14(4): 168-181 ISSN: 1796-217X
doi: 10.17706/jsw.14.4.168-181

How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?

Chao Luo1, Xiaojie Li1*, Jing Yin2, Jia He1,Deng Gao1, Jiliu Zhou1
1 Chengdu University of Information Technology, Chengdu, China.
2 Chongqing UniversityofTechnology, Chongqing, China


Abstract— Convolution neural network(CNN) has been widely applied in many fields and achievedexcellent results, especially in image classification tasks. As we all know, many factors affect theperformance of image classification. In particular, the size of training data sets and the number ofcategories are important factors affecting performance. While for most people, a large number oftraining data set are difficult to obtain or need to do a classification task with a large number ofcategories. Thus, we consider two questions of this approach: How does the size of the data setaffect performance? How does the number of categories affect performance? In order to figure outthese two questions, we constructed two types of experiment: Experiment 1, changing the number ofcategories and exploring how the number of categories affectsperformance in image classificationtask. There are 7 groups experiment performed by increasing the number of categories and performed5 times experiment in each group (35 times experiment in total). Observe the change in accuracy toanalyze the impact of the number of categories on performance. Experiment 2, changing data set sizeand exploring how the data set size affect performance. For each k-classification experiment, we do 5groups by increasing the size of the training set. There are 35 groups experiment performed 5 timesexperiment ineach group (175 times experiment in total). Observe changes in accuracy to analyzethe effect of data set size on performance. For the CNN-based network, the results of experiment1 show that the more categories, the worse the performance, and the less categories, the better theperformance. In addition, when the number of categories to be classified is large, sometimes betteraccuracy can be obtained. The results of experiment 2 show that the larger the training set, thehigherthe test accuracy. When thetraining data set are insufficient, better results can be obtained. Therefore,in classification experiment, when the data set size is small or the number of categories is large, wecan do more experiments and retain thebest results. Results of this paper not only can guide usto do experiments on image classification, but also have important guiding significance for otherexperiments based on deep learning.

Index Terms— Multi-classification; CNN; ResNet

[PDF]

Cite: Chao Luo,Xiaojie Li,Jing Yin,Jia He,Deng gao,Jiliu Zhou, "How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?," Journal of Software vol. 14, no. 4, pp. 168-181, 2019.

General Information

ISSN: 1796-217X (Online)
Frequency: Monthly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, CNKI,etc
E-mail: jsw@iap.org
  • Jun 25, 2019 News!

    Papers published in JSW Vol. 14, No. 1- Vol. 14 No. 6 have been indexed by DBLP.    [Click]

  • Jun 25, 2019 News!

    Vol.13, No.9 has been indexed by EI (Inspec).   [Click]

  • Aug 01, 2018 News!

    [CFP] 2019 the annual meeting of JSW Editorial Board, ICCSM 2019, will be held in Barcelona, Spain, July 14-16, 2019.   [Click]

  • Jul 10, 2019 News!

    Vol 14, No.8 has been published with online version 4 original aritcles from 2 countries are published in this issue.    [Click]

  • Jun 24, 2019 News!

    Vol 14, No. 7 has been published with online version 4 original aritcles from 3 countries are published in this issue.   [Click]