Volume 16 Number 5 (Sep. 2021)
Home > Archive > 2021 > Volume 16 Number 5 (Sep. 2021) >
JSW 2021 Vol.16(5): 219-234 ISSN: 1796-217X
doi: 10.17706/jsw.16.5.219-234

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Wen Yin1*, Chenchen Pan2*, Nanyi Deng3, Dong Ji4
1Department of Computer Science, Columbia University, NYC, NY, USA.
2Department of Management Science and Engineering, Stanford University, Palo Alto, CA, USA.
3Department of Applied Analytics, Columbia University, NYC, NY, USA.
4SuZhou Trust Co., SuZhou, Jiang Su, China.

Abstract—The COVID-19 pandemic has caused a significant negative impact on countries around the world, and there appears to be an observable difference in severity among nations. This study aims to provide an insight into the roles many social and economic factors played in contributing to this variation. By investigating potential patterns through exploratory data analysis, followed by constructing models using several popular machine learning techniques, we examine the validity of the underlying assumptions and identifying any potential limitations. Total deaths per million population is used as dependent variable with log transformation to remove outliers. A set of factors such as life expectancy, unemployment rate and population are available in the dataset. After removing and transforming outliers, various machine learning methods with cross validation are implemented and the optimal model is determined by predefined metrics such as root-mean-squared-error (RMSE) and mean-squared-error (MAE). The results show that the Gradient Boost Machine (GBM) technique achieves the most optimal results in terms of minimum RMSE and MAE. The RMSE and MAE values indicate no over fitting issues and the GBM algorithm captures the most influential factors such as life expectancy, healthcare expense per Gross Domestic Product (GDP) and GDP per capita, which are clearly critical explanatory variables for predicting total deaths per million population.

Index Terms—COVID-19, machine learning, social and economic factors.


Cite: Wen Yin, Chenchen Pan, Nanyi Deng, Dong Ji, "Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries," Journal of Software vol. 16, no. 5, pp. 219-234, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0)

General Information

ISSN: 1796-217X (Online)
Frequency:  Bimonthly (Since 2020)
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, Google Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
  • Apr 26, 2021 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Jun 22, 2020 News!

    Papers published in JSW Vol 14, No 1- Vol 15 No 4 have been indexed by DBLP     [Click]

  • Sep 13, 2021 News!

    The papers published in Vol 16, No 6 have all received dois from Crossref    [Click]

  • Jan 28, 2021 News!

    [CFP] 2021 the annual meeting of JSW Editorial Board, ICCSM 2021, will be held in Rome, Italy, July 21-23, 2021   [Click]

  • Sep 13, 2021 News!

    Vol 16, No 6 has been published with online version     [Click]