[1] 陈 辉,邓东明,韩崇昭.基于区间箱粒子多伯努利滤波器的传感器控制策略[J].自动化学报,2021,47(6):1428-1443. [2] LEONG A S,RAMASWAMY A,QUEVEDO D E,et al.Deep reinforcement learning for wireless sensor scheduling in cyber-physical systems[J].Automatica,2020,113:108759. [3] 陈 辉,刘雅婷,王 莉.高斯混合标签多伯努利滤波器的传感器控制策略[J].兰州理工大学学报,2022,48(2):73-80. [4] FENG H,GUO B Z,WU X H.Trajectory planning approach to output tracking for a 1-D wave equation[J].IEEE Transactions on Automatic Control,2019,65(5):1841-1854. [5] CHAI R,SOURDOS A,SAVVARIS A,et al.Real-time reentry trajectory planning of hypersonic vehicles:a two-step strategy incorporating fuzzy multiobjective transcription and deep neural network[J].IEEE Transactions on Industrial Electronics,2019,67(8):6904-6915. [6] CHEN J,SHUAI Z,ZHANG H,et al.Path following control of autonomous four-wheel-independent-drive electric vehicles via second-order sliding mode and nonlinear disturbance observer techniques[J].IEEE Transactions on Industrial Electronics,2020,68(3):2460-2469. [7] HOANG H G,VO B T.Sensor management for multi-target tracking via multi-Bernoulli filtering[J].Automatica,2014,50(4):1135-1142. [8] ROSS S M,COBB R G,BAKER W P.Stochastic real-time optimal control for bearing-only trajectory planning[J].International Journal of Micro Air Vehicles,2014,6(1):1-27. [9] GORJI A,ADVE R.Policy gradient for observer trajectory planning with application in multi-target tracking problems[C]//2018 52nd Asilomar Conference on Signals,Systems,and Computers.Pacific Grove:IEEE,2018:2029-2033. [10] HOFFMANN F,CHARLISH A,RITCHIE M,et al.Sensor path planning using reinforcement learning[C]//2020 IEEE 23rd International Conference on Information Fusion (FUSION).South Africa:IEEE,2020:1-8. [11] SINGH S S,KANTAS N,VO B N,et al.Simulation-based optimal sensor scheduling with application to observer trajectory planning[J].Automatica,2007,43(5):817-830. [12] SUTTON R S,BARTO A G.Reinforcement learning:an introduction[M].Cambridge:MIT Press,2018. [13] BAYERLEIN H,THEILE M,CACCAMO M,et al.UAV path planning for wireless data harvesting:a deep reinforcement learning approach[C]//2020 IEEE Global Communications Conference.Taipei:IEEE,2020:1-6. [14] ZHU B,BEDEER E,NGUYEN H H,et al.UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning[J].IEEE Transactions on Vehicular Technology,2021,70(9):9540-9554. [15] HU C,WANG Z,TAGHAVIFAR H,et al.MME-EKF-based path-tracking control of autonomous vehicles considering input saturation[J].IEEE Transactions on Vehicular Technology,2019,68(6):5246-5259. [16] MIEHLING E,RASOULI M,TENEKETZIS D.A POMDP approach to the dynamic defense of large-scale cyber networks[J].IEEE Transactions on Information Forensics and Security,2018,13(10):2490-2505. [17] 刘建伟,高 峰,罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报,2019,42(6):1406-1438. [18] 闫 涛,韩崇昭,张光华.空中目标传感器管理方法综述[J].航空学报,2018,39(10):26-36. [19] CASSANO L,YUAN K,SAYED A H.Multiagent fully decentralized value function learning with linear convergence rates[J].IEEE Transactions on Automatic Control,2020,66(4):1497-1512. [20] ZOU D,CAO Y,ZHOU D,et al.Gradient descent optimizes over-parameterized deep ReLU networks[J].Machine Learning,2020,109(3):467-492. [21] QIU C,HU Y,CHEN Y,et al.Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications[J].IEEE Internet of Things Journal,2019,6(5):8577-8588. [22] EFRONI Y,DALAL G,SCHERRER B,et al.Multiple-step greedy policies in approximate and online reinforcement learning[J].Advances in Neural Information Processing Systems,2018,31:1-10. [23] ZHANG K,KOPPEL A,ZHU H,et al.Global convergence of policy gradient methods to (almost) locally optimal policies[J].SIAM Journal on Control and Optimization,2020,58(6):3586-3612. [24] WU J,WEI Z,LIU K,et al.Battery-involved energy management for hybrid electric bus based on expert-assistance deep deterministic policy gradient algorithm[J].IEEE Transactions on Vehicular Technology,2020,69(11):12786-12796. [25] 王 岩,吴晓富.深度神经网络训练中适用于小批次的归一化算法[J].计算机科学,2019,46(11A):273-276. [26] ZAHRAN E H M,KHATER M M A.Modified extended tanh-function method and its applications to the Bogoyavlenskii equation[J].Applied Mathematical Modelling,2016,40(3):1769-1775. [27] ZHU D,LU S,WANG M,et al.Efficient precision-adjustable architecture for softmax function in deep learning[J].IEEE Transactions on Circuits and Systems Ⅱ:Express Briefs,2020,67(12):3382-3386. |