兰州理工大学学报 ›› 2021, Vol. 47 ›› Issue (5): 76-84.

• 自动化技术与计算机技术 • 上一篇    下一篇

基于超复数小波和图像空域的卷积网络融合注视点预测算法

李策*, 朱子重, 许大有, 高伟哲, 靳山岗   

  1. 兰州理工大学 电气工程与信息工程学院, 甘肃 兰州 730050
  • 收稿日期:2020-01-10 出版日期:2021-10-28 发布日期:2021-11-18
  • 作者简介:李 策(1974-),男,辽宁营口人,博士,教授,博导.Email:xjtulice@qmail.com
  • 基金资助:
    国家自然科学基金(61866022),甘肃省基础研究创新群体项目(1506RJIA031),国防基础科研项目(JCKY2018427C002)

Gaze prediction algorithm based on hypercomplex wavelet convolutional network

LI Ce, ZHU Zi-zhong, XU Da-you, GAO Wei-zhe, JIN Shan-gang   

  1. College of Electrical and Information Engineering, Lanzhou Univ. of Tech., Lanzhou 730050, China
  • Received:2020-01-10 Online:2021-10-28 Published:2021-11-18

摘要: 针对已有注视点预测模型存在特征细节缺失、尺度单一和背景信息干扰严重导致的注视点预测精度偏低等问题,提出了一种基于超复数小波和图像空域的卷积网络融合注视点预测算法.首先,针对细节特征丢失问题,使用超复数小波变换在频域中提取图像的细节特征,与卷积网络提取的空域特征进行融合.然后,通过空洞空间金字塔池化模块,融合不同感受得到的特征图,有效解决了特征尺度单一的问题.最后,引入了残差卷积注意力模块,结合空间和通道的注意力机制,能够有效抑制背景信息的干扰,提高注视点预测精度.在SALICON数据集上,CC、sAUC和SIM评价指标下,该算法的性能达到0.884 7、0.769 3和0.778 0;在CAT2000数据集上,该算法在相应指标下的性能为0.735 5、0.870 1和0.664 5.主客观对比实验结果表明,该算法具有较好的注视点预测能力.

关键词: 注视点预测, 超复数小波变换, 空域特征, 卷积网络

Abstract: Gaze based prediction algorithms has a wide range of applications in object recognition, video compression, object tracking and so on. For existing gaze prediction models, the accuracy of gaze prediction is low due to the lack of feature details, single scale, and serious background information interference. This paper proposes a gaze prediction algorithm based on hypercomplex wavelet convolutional network. Firstly, aiming at the problem of loss of detailed features, the hypercomplex wavelet transform is used to extract the detailed features of the image in the frequency domain and fused with the spatial features extracted by the convolutional network. Then, through the atrous spatial pyramid pooling module, the feature maps obtained from different receptive fields are fused to effectively solve the problem of single feature scale. Finally, the proposed algorithm introduces a residual convolutional attention module, which combines spatial and channel attention mechanisms to effectively suppress the interference of background information and improve the accuracy of gaze prediction. On the SALICON datasets, CC, sAUC and SIM evaluation metrics, the performance of the proposed algorithm reaches 0.884 7, 0.769 3 and 0.778 0. On the CAT2000 datasets, the performance of the proposed algorithm is 0.735 5, 0.870 1, and 0.664 5. The experimental results show that the proposed algorithm has a good ability to predict fixation points.

Key words: gaze prediction, hypercomplex wavelet transform, spatial features, convolutional neural network

中图分类号: