Volume 16 Number 5 (Sep. 2021)
Home > Archive > 2021 > Volume 16 Number 5 (Sep. 2021) >
JSW 2021 Vol.16(5): 219-234 ISSN: 1796-217X
doi: 10.17706/jsw.16.5.219-234

Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries

Wen Yin1*, Chenchen Pan2*, Nanyi Deng3, Dong Ji4

1Department of Computer Science, Columbia University, NYC, NY, USA.
2Department of Management Science and Engineering, Stanford University, Palo Alto, CA, USA.
3Department of Applied Analytics, Columbia University, NYC, NY, USA.
4SuZhou Trust Co., SuZhou, Jiang Su, China.

Abstract—The COVID-19 pandemic has caused a significant negative impact on countries around the world, and there appears to be an observable difference in severity among nations. This study aims to provide an insight into the roles many social and economic factors played in contributing to this variation. By investigating potential patterns through exploratory data analysis, followed by constructing models using several popular machine learning techniques, we examine the validity of the underlying assumptions and identifying any potential limitations. Total deaths per million population is used as dependent variable with log transformation to remove outliers. A set of factors such as life expectancy, unemployment rate and population are available in the dataset. After removing and transforming outliers, various machine learning methods with cross validation are implemented and the optimal model is determined by predefined metrics such as root-mean-squared-error (RMSE) and mean-squared-error (MAE). The results show that the Gradient Boost Machine (GBM) technique achieves the most optimal results in terms of minimum RMSE and MAE. The RMSE and MAE values indicate no over fitting issues and the GBM algorithm captures the most influential factors such as life expectancy, healthcare expense per Gross Domestic Product (GDP) and GDP per capita, which are clearly critical explanatory variables for predicting total deaths per million population.

Index Terms—COVID-19, machine learning, social and economic factors.


Cite: Wen Yin, Chenchen Pan, Nanyi Deng, Dong Ji, "Applying Statistical Machine Learning Methods to Analysis Differences in the Severity Level of COVID-19 among Countries," Journal of Software vol. 16, no. 5, pp. 219-234, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0)

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Apr 26, 2021 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Nov 18, 2021 News!

    Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP   [Click]

  • Dec 24, 2021 News!

     Vol 15, No 1- Vol 15, No 6 has been indexed by IET-(Inspec)   [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Dec 06, 2019 News!

    Vol 14, No 1- Vol 14, No 4 has been indexed by EI (Inspec)   [Click]