How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?

JSW 2019 Vol.14(4): 168-181 ISSN: 1796-217X
doi: 10.17706/jsw.14.4.168-181

Chao Luo¹, Xiaojie Li^1*, Jing Yin², Jia He¹,Deng Gao¹, Jiliu Zhou¹

¹ Chengdu University of Information Technology, Chengdu, China.
² Chongqing UniversityofTechnology, Chongqing, China

Abstract— Convolution neural network(CNN) has been widely applied in many fields and achievedexcellent results, especially in image classification tasks. As we all know, many factors affect theperformance of image classification. In particular, the size of training data sets and the number ofcategories are important factors affecting performance. While for most people, a large number oftraining data set are difficult to obtain or need to do a classification task with a large number ofcategories. Thus, we consider two questions of this approach: How does the size of the data setaffect performance? How does the number of categories affect performance? In order to figure outthese two questions, we constructed two types of experiment: Experiment 1, changing the number ofcategories and exploring how the number of categories affectsperformance in image classificationtask. There are 7 groups experiment performed by increasing the number of categories and performed5 times experiment in each group (35 times experiment in total). Observe the change in accuracy toanalyze the impact of the number of categories on performance. Experiment 2, changing data set sizeand exploring how the data set size affect performance. For each k-classification experiment, we do 5groups by increasing the size of the training set. There are 35 groups experiment performed 5 timesexperiment ineach group (175 times experiment in total). Observe changes in accuracy to analyzethe effect of data set size on performance. For the CNN-based network, the results of experiment1 show that the more categories, the worse the performance, and the less categories, the better theperformance. In addition, when the number of categories to be classified is large, sometimes betteraccuracy can be obtained. The results of experiment 2 show that the larger the training set, thehigherthe test accuracy. When thetraining data set are insufficient, better results can be obtained. Therefore,in classification experiment, when the data set size is small or the number of categories is large, wecan do more experiments and retain thebest results. Results of this paper not only can guide usto do experiments on image classification, but also have important guiding significance for otherexperiments based on deep learning.

Index Terms— Multi-classification; CNN; ResNet

[PDF]

Cite: Chao Luo,Xiaojie Li,Jing Yin,Jia He,Deng gao,Jiliu Zhou, "How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?," Journal of Software vol. 14, no. 4, pp. 168-181, 2019.

PREVIOUS PAPER

Software Reuse in Organizations: A Survey in Moroccan Software Industry Context

NEXT PAPER

Point Cloud Data Processing and Analysis for 3D Measurement of Ship Hull Plate

General Information

ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Biannually
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, CNKI,
Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com

What's New

Mar 07, 2025 News!

Vol 19, No 4 has been published with online version [Click]
Mar 07, 2025 News!

JSW had implemented online submission system [Click]
Apr 01, 2024 News!

Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) 　 [Click]
Apr 01, 2024 News!

Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
Oct 22, 2024 News!

Vol 19, No 3 has been published with online version [Click]

Volume 14 Number 4 (Apr. 2019)

Home > Archive > 2019 > Volume 14 Number 4 (Apr. 2019) >

How Does the Data Set and the Number of Categories Affect CNN-based Image Classification Performance?

General Information