Journal of Lanzhou University of Technology ›› 2024, Vol. 50 ›› Issue (5): 77-85.

• Automation Technique and Computer Technology • Previous Articles     Next Articles

Trajectory planning of radar observer based on Monte Carlo policy gradient

CHEN Hui1, WANG Jing-yu1, ZHANG Wen-xu1, ZHAO Yong-hong2, XI Lei3   

  1. 1. College of Electrical and Information Engineering, Lanzhou Univ. of Tech., Lanzhou 730050, China;
    2. Gansu Province Changfeng Electronic Technology Co. LTD., Lanzhou 730070, China;
    3. Institute of Automation, Gansu Academy of Sciences, Lanzhou 730000, China
  • Received:2022-06-12 Online:2024-10-28 Published:2024-10-31

Abstract: In the radar observer trajectory planning (OTP) of the target tracking process, for the intelligent decision-making problem of Markov stepping planning, a radar trajectory planning method based on the Monte Carlo policy gradient (MCPG) algorithm is proposed in the discrete action space. First, the OTP process is modeled as a continuous Markov decision process (MDP) by combining the target tracking state, reward mechanism, action plan, and radar observer position. A global intelligent planning method based on MCPG is then proposed. Next, by considering each time step in the tracking episode length as a separate episode for policy updates, a step-wise intelligent planning method based on the observer trajectory in MCPG target tracking is proposed. Then, the tracking estimation characteristics of the target are deeply studied, and a reward function for the purpose of tracking performance optimization is constructed. Finally, the simulation experiment of the intelligent OTP decision-making based on reinforcement learning in the optimal nonlinear target tracking shows the effectiveness of the proposed method.

Key words: target tracking, radar observer trajectory planning, policy gradient, reward function

CLC Number: