ty -jour a2 -wang,weitian au -Zeng,fanyu au -Wang,陈PY -2020 DA -2020/10/15 TI-在人工代理中使用异步近端政策优化的视觉导航SP -8702962gradient methods suffer from high variance, leading to unstable policies during training, where the policy’s performance fluctuates drastically between iterations. To address this issue, we analyze the policy optimization process of the navigation method based on deep reinforcement learning (DRL) that uses asynchronous gradient descent for optimization. A variant navigation (asynchronous proximal policy optimization navigation,
Apponav提出)可以保证政策在政策优化过程中的单调改进。我们的实验在DeepMind Lab中进行了测试,实验结果表明,具有
Apponav性能比比较算法更好。SN -1687-9600 UR -https://doi.org/10.1155/2020/8702962 do -10.1155/2020/8702962 JF-机器人PB -Hindawi kw -er -er -er -er -er -er-