Journal of Lanzhou University of Technology ›› 2026, Vol. 52 ›› Issue (2): 99-106.

• Automation Technique and Computer Technology • Previous Articles     Next Articles

Research on image semantic description method based on RVC network

LIU Zhong-min1,2, CHEN Heng1,2, HU Wen-jin3   

  1. 1. School of Automation and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730050, China;
    2. Key Laboratory of Gansu Advanced Control for Industrial Processes, Lanzhou University of Technology, Lanzhou 730050, China;
    3. College of Mathematic and Computer Science, Northwest Minzu University, Lanzhou 730000, China
  • Received:2023-09-17 Online:2026-04-28 Published:2026-04-28

Abstract: To address the problems of inaccurate description statements and more irrelevant information in the process of image semantic description, an image semantic description method based on the RVC network is proposed. Firstly, the visual area features are extracted using the ResNeXt-101 network and Vision Transformer network in the image feature extraction stage. Secondly, the significant areas of the extracted visual features are assigned more weight, and the insignificant areas are assigned less weight by combining the channel attention mechanism, and the unclear areas of the image are optimized. Finally, the image decoding module combines the visual features with the semantic features to generate the descriptive statements of the image. In order to verify the effectiveness of the RVC network in describing the image semantics, experiments were conducted on the MS COCO dataset and compared with existing methods. The results demonstrate that the RVC network can more effectively extract image features, producing more accurate and enriched descriptive sentences.

Key words: semantic description, feature extraction, Vision Transformer, channel attention mechanism

CLC Number: