兰州理工大学学报 ›› 2025, Vol. 51 ›› Issue (3): 99-106.

• 自动化技术与计算机技术 • 上一篇    下一篇

基于细粒度字词特征的中文作者识别模型

赵宏*, 张陈鹏, 王奡隆, 张扬   

  1. 兰州理工大学 计算机与通信学院, 甘肃 兰州 730050
  • 收稿日期:2022-07-09 出版日期:2025-06-28 发布日期:2025-06-30
  • 通讯作者: 赵 宏(1971-),男,甘肃陇南人,博士,教授,博导.Email:zhaoh@lut.edu.cn
  • 基金资助:
    国家自然科学基金 (62166025),甘肃省重点研发计划(21YF5GA073)

Chinese authorship identification based on fine-grained word feature

ZHAO Hong, ZHANG Chen-peng, WANG Ao-long, ZHANGYang   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Received:2022-07-09 Online:2025-06-28 Published:2025-06-30

摘要: 现有作者识别模型大多针对英文建立,但由于中文与英文在语法和语言组成要素方面的差异,英文作者识别模型用于中文文本时会出现较大的偏差.为解决中文作者识别的问题,提出一种适配中文特点的模型,称为细粒度字词特征的中文作者识别模型.该模型使用并行卷积提取1至4字词的细粒度特征,结合注意力机制进行权重分配,最后通过分类器实现中文作者识别.实验结果表明,该模型与BERT、文本卷积网络(TextCNN)、循环神经网络(RNN)等基线模型相比,在三个中文作者识别数据集上的准确率平均提高2.09%、7.2%和6.71%,具有较高的实用价值.

关键词: 中文作者识别, BERT, 注意力机制, 并行卷积层, 细粒度特征

Abstract: Most existing authorship identification models are primarily designed for English texts. However, due to the differences between Chinese and English in grammar and language elements, the English authorship identification models have large deviations when applied to Chinese text. To solve the problem of Chinese author identification, a model adapted to Chinese features is proposed, termed the Chinese author recognition model with fine-grained word features. The model uses parallel convolution to extract fine-grained features from 1 to 4 characters and combines with an attention mechanism for weight assignment. Finally, Chinese authorship identification is obtained by the classifier. The experimental results show that the accuracy of this model is average improved by 2.09%, 7.2%, and 6.71% on three Chinese author identification datasets compared with the baseline models of BERT, TextCNN, and RNN, separately. Therefore, this model has a high value in reality.

Key words: Chinese authorship identification, BERT, attention mechanism, parallel convolutional layers, fine-grained feature

中图分类号: