JSW 2019 Vol.14(4): 168-181 ISSN: 1796-217X
doi: 10.17706/jsw.14.4.168-181
doi: 10.17706/jsw.14.4.168-181
How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?
Chao Luo1, Xiaojie Li1*, Jing Yin2, Jia He1,Deng Gao1, Jiliu Zhou1
1 Chengdu University of Information Technology, Chengdu, China.
2 Chongqing UniversityofTechnology, Chongqing, China
Abstract— Convolution neural network(CNN) has been widely applied in many fields and achievedexcellent results, especially in image classification tasks. As we all know, many factors affect theperformance of image classification. In particular, the size of training data sets and the number ofcategories are important factors affecting performance. While for most people, a large number oftraining data set are difficult to obtain or need to do a classification task with a large number ofcategories. Thus, we consider two questions of this approach: How does the size of the data setaffect performance? How does the number of categories affect performance? In order to figure outthese two questions, we constructed two types of experiment: Experiment 1, changing the number ofcategories and exploring how the number of categories affectsperformance in image classificationtask. There are 7 groups experiment performed by increasing the number of categories and performed5 times experiment in each group (35 times experiment in total). Observe the change in accuracy toanalyze the impact of the number of categories on performance. Experiment 2, changing data set sizeand exploring how the data set size affect performance. For each k-classification experiment, we do 5groups by increasing the size of the training set. There are 35 groups experiment performed 5 timesexperiment ineach group (175 times experiment in total). Observe changes in accuracy to analyzethe effect of data set size on performance. For the CNN-based network, the results of experiment1 show that the more categories, the worse the performance, and the less categories, the better theperformance. In addition, when the number of categories to be classified is large, sometimes betteraccuracy can be obtained. The results of experiment 2 show that the larger the training set, thehigherthe test accuracy. When thetraining data set are insufficient, better results can be obtained. Therefore,in classification experiment, when the data set size is small or the number of categories is large, wecan do more experiments and retain thebest results. Results of this paper not only can guide usto do experiments on image classification, but also have important guiding significance for otherexperiments based on deep learning.
Index Terms— Multi-classification; CNN; ResNet
2 Chongqing UniversityofTechnology, Chongqing, China
Abstract— Convolution neural network(CNN) has been widely applied in many fields and achievedexcellent results, especially in image classification tasks. As we all know, many factors affect theperformance of image classification. In particular, the size of training data sets and the number ofcategories are important factors affecting performance. While for most people, a large number oftraining data set are difficult to obtain or need to do a classification task with a large number ofcategories. Thus, we consider two questions of this approach: How does the size of the data setaffect performance? How does the number of categories affect performance? In order to figure outthese two questions, we constructed two types of experiment: Experiment 1, changing the number ofcategories and exploring how the number of categories affectsperformance in image classificationtask. There are 7 groups experiment performed by increasing the number of categories and performed5 times experiment in each group (35 times experiment in total). Observe the change in accuracy toanalyze the impact of the number of categories on performance. Experiment 2, changing data set sizeand exploring how the data set size affect performance. For each k-classification experiment, we do 5groups by increasing the size of the training set. There are 35 groups experiment performed 5 timesexperiment ineach group (175 times experiment in total). Observe changes in accuracy to analyzethe effect of data set size on performance. For the CNN-based network, the results of experiment1 show that the more categories, the worse the performance, and the less categories, the better theperformance. In addition, when the number of categories to be classified is large, sometimes betteraccuracy can be obtained. The results of experiment 2 show that the larger the training set, thehigherthe test accuracy. When thetraining data set are insufficient, better results can be obtained. Therefore,in classification experiment, when the data set size is small or the number of categories is large, wecan do more experiments and retain thebest results. Results of this paper not only can guide usto do experiments on image classification, but also have important guiding significance for otherexperiments based on deep learning.
Index Terms— Multi-classification; CNN; ResNet
Cite: Chao Luo,Xiaojie Li,Jing Yin,Jia He,Deng gao,Jiliu Zhou, "How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?," Journal of Software vol. 14, no. 4, pp. 168-181, 2019.
General Information
ISSN: 1796-217X (Online)
Frequency: Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKI, Google Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
-
Apr 26, 2021 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP [Click]
-
Dec 24, 2021 News!
Vol 15, No 1- Vol 15, No 6 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
[CFP] 2022 the annual meeting of JSW Editorial Board, ICCSM 2022, will be held in Rome, Italy, July 21-23, 2022 [Click]
-
Aug 01, 2023 News!