Journal of Lanzhou University of Technology ›› 2023, Vol. 49 ›› Issue (1): 103-109.

• Automation Technique and Computer Technology • Previous Articles     Next Articles

Multi-label text classification based on word-label probability

ZHAO Hong, ZHENG Hou-ze, GUO Lan   

  1. School of Computer and Communication, Lanzhou Univ. of Tech., Lanzhou 730050, China
  • Received:2021-09-10 Online:2023-02-28 Published:2023-03-21

Abstract: Multi-label text classification is one of the important tasks in the field of natural language processing, the goal of which is to find the label subset associated with the text from a given label set. Aiming at the problem of how to effectively extract text features and obtain the potential correlation between labels in processing multi-label text classification, a model of convolutional neural networks (CNN) combined with bi-directional long short-term memory (Bi-LSTM) is proposed to process multi-label text classification. Firstly, text features are extracted through the CNN network and max pooling. Then, the trained Labeled Latent Dirichlet Allocation (labeled LDA) model is used to obtain the word-label probability information of all words and labels. In addition, the Bi-LSTM network and CNN network are used to extract the word-label information feature of each word in the current prediction text. Finally, combined with the extracted text features, the label set associated with the text is predicted. The experimental results show that the F1 value of the model can be effectively improved by using the word-label probability to get the correlation information between the words and labels in the text.

Key words: multi-label text classification, convolutional neural networks, bi-directional long short-term memory, labeled latent dirichlet allocation

CLC Number: