Journal of Lanzhou University of Technology ›› 2022, Vol. 48 ›› Issue (5): 99-106.

• Automation Technique and Computer Technology • Previous Articles     Next Articles

Uyghur-Chinese neural machine translation method based on back translation and ensemble learning

FENG Xiao1,2,3, YANG Ya-ting1,2,3, DONG Rui1,2,3, AZMAT Anwar1,2,3, MA Bo1,2,3   

  1. 1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011, China
  • Received:2021-04-16 Online:2022-10-28 Published:2022-11-21

Abstract: From the perspective of efficient utilization of existing resources, a method based on back-translation and ensemble learning is proposed to solve the problem of the poor performance of Uyghur-Chinese neural machine translation caused by the lack of parallel corpus. Firstly, Uyghur and Chinese pseudo-parallel corpora are constructed by using back translation and large-scale Chinese monolingual corpora, and the intermediate model is obtained by using pseudo parallel corpora training. Secondly, the bootstrap is used to resample the original parallel corpus for N times, and N sub-datasets with similar distribution but different characteristics are obtained. The intermediate model were fine-tuned based on N sub-data sets, and N sub-models with differences were obtained. Finally, integrate these sub-models. Experiments on the test sets of CWMT2015 and CWMT2017 show that theBLEU(Bilingual Evaluation Understudy) value of this method are 2.37 and 1.63 higher than that of the baseline system, respectively.

Key words: neural machine translation, back translation, ensemble learning, intermediate model, fine tuning, catastrophic forgetting

CLC Number: