Journal of Lanzhou University of Technology ›› 2026, Vol. 52 ›› Issue (2): 91-98.

• Automation Technique and Computer Technology • Previous Articles     Next Articles

Intestinal polyp image segmentation method based on improved CSWin Transformer

ZHAO Hong1, MI Shan1, AN Ding2   

  1. 1. School of Computer Science and Artificial Intelligence, Lanzhou University of Technology, Lanzhou 730050, China;
    2. CCCC (Zhongwei) Big Data Technology Co., Ltd., Zhongwei 755000, China
  • Received:2023-12-22 Online:2026-04-28 Published:2026-04-28

Abstract: To address the issue of limited segmentation accuracy in traditional UNet-type intestinal polyp segmentation models, a modified CSWin Transformer-based model for intestinal polyp segmentation is proposed. The model consists of two main parts: an encoder and a decoder. Firstly, during the encoding stage, a CSWin Transformer with a cross window is utilized as the encoder to extract global context information from intestinal polyp images. The CBAM is introduced into each layer of the encoder's CSWin Transformer block to enhance the model's ability to capture polyp area and edge information. Secondly, in the decoding stage, the CSWin Transformer is also employed as the decoder, and the encoder and decoder are connected through skip connections. Finally, in the middle layers of the encoder and decoder, a self-aware attention module (SAA) is applied to establish non-local information interaction between features. Experimental results on the open-source Kvasir-SEG, CVC-ClinicDB, EndoTect, and CVC-ColonDB datasets show that the proposed method achieves Dice coefficients of 0.888, 0.927, 0.904, and 0.911, respectively, along with MIoU values of 0.876, 0.902, 0.831, and 0.860. Compared to the traditional U-shaped intestinal polyp segmentation model, the Dice coefficient and MIoU have increased by 2.1% and 2.5%.

Key words: intestinal polyp segmentation, CSWin Transformer, convolution block attention, non-local information interaction, deep learning

CLC Number: