Volume 16 Number 5 (Sep. 2021)
Home > Archive > 2021 > Volume 16 Number 5 (Sep. 2021) >
JSW 2021 Vol.16(5): 219-234 ISSN: 1796-217X
doi: 10.17706/jsw.16.5.219-234

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Wen Yin1*, Chenchen Pan2*, Nanyi Deng3, Dong Ji4

1Department of Computer Science, Columbia University, NYC, NY, USA.
2Department of Management Science and Engineering, Stanford University, Palo Alto, CA, USA.
3Department of Applied Analytics, Columbia University, NYC, NY, USA.
4SuZhou Trust Co., SuZhou, Jiang Su, China.

Abstract—The COVID-19 pandemic has caused a significant negative impact on countries around the world, and there appears to be an observable difference in severity among nations. This study aims to provide an insight into the roles many social and economic factors played in contributing to this variation. By investigating potential patterns through exploratory data analysis, followed by constructing models using several popular machine learning techniques, we examine the validity of the underlying assumptions and identifying any potential limitations. Total deaths per million population is used as dependent variable with log transformation to remove outliers. A set of factors such as life expectancy, unemployment rate and population are available in the dataset. After removing and transforming outliers, various machine learning methods with cross validation are implemented and the optimal model is determined by predefined metrics such as root-mean-squared-error (RMSE) and mean-squared-error (MAE). The results show that the Gradient Boost Machine (GBM) technique achieves the most optimal results in terms of minimum RMSE and MAE. The RMSE and MAE values indicate no over fitting issues and the GBM algorithm captures the most influential factors such as life expectancy, healthcare expense per Gross Domestic Product (GDP) and GDP per capita, which are clearly critical explanatory variables for predicting total deaths per million population.

Index Terms—COVID-19, machine learning, social and economic factors.


Cite: Wen Yin, Chenchen Pan, Nanyi Deng, Dong Ji, "Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries," Journal of Software vol. 16, no. 5, pp. 219-234, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0)

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]