柳毅, 曾昊. 改进K-means的双向采样非均衡数据分类方法[J]. 微电子学与计算机, 2020, 37(3): 60-65.
引用本文: 柳毅, 曾昊. 改进K-means的双向采样非均衡数据分类方法[J]. 微电子学与计算机, 2020, 37(3): 60-65.
LIU Yi, ZENG Hao. Improved the bi-directional sampling unbalanced data classification method of K-means[J]. Microelectronics & Computer, 2020, 37(3): 60-65.
Citation: LIU Yi, ZENG Hao. Improved the bi-directional sampling unbalanced data classification method of K-means[J]. Microelectronics & Computer, 2020, 37(3): 60-65.

改进K-means的双向采样非均衡数据分类方法

Improved the bi-directional sampling unbalanced data classification method of K-means

  • 摘要: 针对分类器在不均衡数据集上对小类分类准确率较差的问题,提出了改进K-means的双向采样算法KMBS(k-means bi-directional sampling),并将集成学习应用在分类算法上.首先,使用改进的K-means聚类算法将原始数据集划分为不同的聚类簇.其次,在聚类簇中使用改进的SMOTE算法对小类样本过采样,对聚类簇内的大类样本欠采样,使数据集平衡.多次执行该算法可以产生多个差异较大的数据集,因此训练出多个差异较大的分类器,提升集成学习的效果.通过分析实验结果,该算法较现有几种算法不仅能提高整体分类性能,并且有效提高小类样本的分类性能.

     

    Abstract: Aiming at the poor classification accuracy of minority classes by classifier on unbalanced data sets, an improved k-means bi-directional sampling algorithm KMBS (k-means bi-directional sampling) is proposed, and integrated learning is applied to the classification algorithm. First, the improved k-means clustering algorithm is used to divide the original data set into different clustering clusters. Secondly, oversampling of the minority and under-sampling of the majority in the cluster using the modified SMOTE algorithm in the cluster, so as to make the dataset balance. Multiple executions of this algorithm can produce multiple data sets with large differences, so multiple classifiers with large differences can be trained to improve the effect of ensemble learning. By analyzing the experimental results, this algorithm can not only improve the overall classification performance, but also improve the classification performance of a few kinds of samples.

     

/

返回文章
返回