兰州理工大学学报 ›› 2023, Vol. 49 ›› Issue (5): 93-101.

• 自动化技术与计算机技术 • 上一篇    下一篇

基于改进的哈里斯鹰优化算法的特征选择

赵小强*1,2,3, 强睿儒1   

  1. 1.兰州理工大学 电气工程与信息工程学院, 甘肃 兰州 730050;
    2.兰州理工大学 甘肃省工业过程先进控制重点实验室, 甘肃 兰州 730050;
    3.兰州理工大学 国家级电气与控制工程实验教学中心, 甘肃 兰州 730050
  • 收稿日期:2021-11-23 出版日期:2023-10-28 发布日期:2023-11-07
  • 通讯作者: 赵小强(1969-),男,陕西宝鸡人,博士,教授,博导. Email:xqzhao@lut.edu.cn
  • 基金资助:
    国家自然科学基金(62263021),甘肃省教育厅产业支撑计划项目(2023CYZC-24)

Feature selection based on an improved Harris hawk optimization algorithm

ZHAO Xiao-qiang1,2,3, QIANG Rui-ru1   

  1. 1. School of Electrical Engineering and Information Engineering, Lanzhou Univ. of Tech., Lanzhou 730050, China;
    2. Key Laboratory of Advanced Control of Industrial Processes of Gansu Province, Lanzhou Univ. of Tech., Lanzhou 730050, China;
    3. National Electrical and Control Engineering Experimental Teaching Center, Lanzhou Univ. of Tech., Lanzhou 730050, China
  • Received:2021-11-23 Online:2023-10-28 Published:2023-11-07

摘要: 特征选择是一项旨在通过移除不相关、冗余的数据来减少特征数量,同时可以保持较高的分类精度的机器学习任务.针对哈里斯鹰优化算法(HHO)不能在离散的特征空间进行特征选择,以及算法后期种群多样性减少、易陷入局部最优等问题,提出了基于改进的哈里斯鹰的特征选择算法.首先使用混沌映射初始种群多样化,以确保在种群质量较优的前提下能够均匀分布于搜索空间;其次,通过引入高斯变异算子对兔子的位置进行重新更新,以避免算法陷入局部最优;最后设计二次优化算法的二进制版本并将其应用于基于KNN分类器的包裹式特征选择问题中.通过在18个经典的UCI数据集进行特征选择仿真实验,结果显示该算法在适应度值、平均分类准确度以及平均特征选择数量上比其他主流算法都能获得更好的结果,从而表明该算法能够进行有效的提取特征子集和得到更准确的数据分类,并能够实现更高的寻优精度.

关键词: 包裹式特征选择, 哈里斯鹰优化算法, 混沌映射, 高斯变异

Abstract: Feature selection is a machine learning task that aims to reduce the number of features by removing irrelevant and redundant data while maintaining high classification accuracy. In order to address the problems that Harris hawk optimization algorithm (HHO) cannot perform feature selection in the discrete feature space, and that the population diversity is reduced and is easy to fall into local optimality in the later stage of the algorithm, a feature selection algorithm based on an improved Harris Hawk is proposed here. First, chaotic mapping is used to diversify the initial population to ensure that it can be evenly distributed in the search space under the premise of better population quality. Secondly, the position of the rabbit is re-updated by introducing a Gaussian mutation operator to avoid the algorithm falling into the local maximum. Finally, the binary version of the secondary optimization algorithm is designed and applied to the wrapped feature selection problem based on the KNN classifier. Through feature selection simulation experiments on 18 classic UCI data sets, the results show that the proposed algorithm in this paper can obtain better results than other mainstream algorithms in terms of fitness value, average classification accuracy and average feature selection number. So the proposed algorithm in this paper can effectively extract feature subsets and obtain more accurate data classification, and can achieve higher optimization accuracy.

Key words: wrapped feature selection, Harris hawk optimization algorithm, chaotic mapping, Gaussian mutation

中图分类号: