JSW 2017 Vol.12(1): 62-81 ISSN: 1796-217X
doi: 10.17706/jsw.12.1.62-80
doi: 10.17706/jsw.12.1.62-80
An Improved K-means Algorithm Based on Structure Features
Qiang Zhan
1School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
2College of Engineering, Forestry, and Natural Sciences, Northern Arizona University, Arizona, America.
Abstract—In K-means clustering, we are given a set of n data points in multidimensional space, and the problem is to determine the number k of clusters. In this paper, we present three methods which are used to determine the true number of spherical Gaussian clusters with additional noise features. Our algorithms take into account the structure of Gaussian data sets and the initial centroids. These three algorithms have their own emphases and characteristics. The first method uses Minkowski distance as a measure of similarity, which is suitable for the discovery of non-convex spherical shape or the clusters with a large difference in size. The second method uses feature weighted Minkowski distance, which emphasizes the different importance of different features for the clustering results. The third method combines Minkowski distance with the best feature factors. We experiment with a variety of general evaluation indexes on Gaussian data sets with and without noise features. The results showed that the algorithms have higher precision than traditional K-means algorithm.
Index Terms—K-means, feature weighting, clustering, cluster validity index.
2College of Engineering, Forestry, and Natural Sciences, Northern Arizona University, Arizona, America.
Abstract—In K-means clustering, we are given a set of n data points in multidimensional space, and the problem is to determine the number k of clusters. In this paper, we present three methods which are used to determine the true number of spherical Gaussian clusters with additional noise features. Our algorithms take into account the structure of Gaussian data sets and the initial centroids. These three algorithms have their own emphases and characteristics. The first method uses Minkowski distance as a measure of similarity, which is suitable for the discovery of non-convex spherical shape or the clusters with a large difference in size. The second method uses feature weighted Minkowski distance, which emphasizes the different importance of different features for the clustering results. The third method combines Minkowski distance with the best feature factors. We experiment with a variety of general evaluation indexes on Gaussian data sets with and without noise features. The results showed that the algorithms have higher precision than traditional K-means algorithm.
Index Terms—K-means, feature weighting, clustering, cluster validity index.
Cite: Qiang Zhan, "An Improved K-means Algorithm Based on Structure Features," Journal of Software vol. 12, no. 1, pp. 62-80, 2017.
NEXT PAPER
Last page
General Information
ISSN: 1796-217X (Online)
Frequency: Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKI, Google Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
-
Apr 26, 2021 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP [Click]
-
Dec 24, 2021 News!
Vol 15, No 1- Vol 15, No 6 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
[CFP] 2022 the annual meeting of JSW Editorial Board, ICCSM 2022, will be held in Rome, Italy, July 21-23, 2022 [Click]
-
May 04, 2023 News!
Vol 18, No 2 has been published with online version [Click]