基于多智能体强化学习的无人机群室内辅助救援

doi:10.15888/j.cnki.csa.008302

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 4:12 星期四

首页 > 过刊浏览>2022年第31卷第2期 >88-95. DOI:10.15888/j.cnki.csa.008302

PDF HTML阅读 XML下载导出引用引用提醒

基于多智能体强化学习的无人机群室内辅助救援
DOI:
                        10.15888/j.cnki.csa.008302
                    
CSTR:
                        
                    
作者:
                        郭天昊郭天昊
山西大学 物理电子工程学院, 太原 030006
在期刊界中查找
在百度中查找
在本站中查找
张钢张钢
山西大学 物理电子工程学院, 太原 030006
在期刊界中查找
在百度中查找
在本站中查找
岳文渊岳文渊
山西大学 物理电子工程学院, 太原 030006
在期刊界中查找
在百度中查找
在本站中查找
王倩王倩
山西大学 物理电子工程学院, 太原 030006
在期刊界中查找
在百度中查找
在本站中查找
郭大波郭大波
山西大学 物理电子工程学院, 太原 030006
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山西省基础研究项目（201801D121118）

Indoor Assisted Rescue by UAV Group Based on Multi-agent Reinforcement Learning

Author:

GUO Tian-Hao
GUO Tian-Hao
College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Gang
ZHANG Gang
College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
在期刊界中查找
在百度中查找
在本站中查找
YUE Wen-Yuan
YUE Wen-Yuan
College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Qian
WANG Qian
College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
在期刊界中查找
在百度中查找
在本站中查找
GUO Da-Bo
GUO Da-Bo
College of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [17]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

本文主要研究了在室内场景中使用多台无人机设备对受害者进行合作搜索的问题. 在室内场景中, 依赖全球定位系统获取受害者位置信息可能是不可靠的. 为此, 本文提出一种基于多智能体强化学习(MARL)方案, 该方案着重对无人机团队辅助救援时的路径规划问题进行研究. 相比于传统方案, 所提方案在大型室内救援场景中更具优势, 例如部署多台救援无人机、救援多位受害者. 本方案也考虑了无人机的充电问题, 保证无人机的电量始终充足. 具体地, 鉴于模型中的救援场景深度参数不断变化, 所提方案将搜索路径规划问题模拟为部分可观的马尔可夫决策过程(Dec-POMDP), 为使得对无人机控制策略最优, 本文又训练了一个双深度的Q网络架构(Double DQN). 最后使用蒙特卡罗方法验证了本方案在大型室内环境中能够使多台无人机有效合作, 且能最大化搜集受害者所用手机内部所存储的位置信息.

关键词:无人机;室内救援;路径规划;马尔可夫决策;蒙特卡洛

Abstract:

This work mainly studies the problem of using multiple unmanned aerial vehicles (UAVs) to search for victims cooperatively in indoor scenes where the location information of victims relying on the global positioning system may be unreliable. To this end, this study proposes a multi-agent reinforcement learning (MARL) based solution which focuses on the path planning studies when the UAV team assists the rescue. Compared with the traditional solution, the proposed solution has advantages in large-scale indoor rescue scenes, such as deploying multiple rescue UAVs and rescuing multiple victims. At the same time, this solution also considers the charging problem of the UAVs to ensure that the power of the UAVs is always sufficient. Specifically, due to the continuous changes of the rescue scene depth parameters in the model, the proposed solution simulates the path planning as a decentralized partially observable Markov decision process (Dec-POMDP). To optimize the UAV control strategy, this study also trains a double deep Q-learning network (Double DQN). Finally, the Monte Carlo method is used to verify that this solution can effectively cooperate with multiple UAVs in a large-scale indoor environment and maximize the collection of the location information stored in the mobile phone used by the victim.

Key words:unmanned aerial vehicle (UAV);indoor rescue;path planning;Markov decision;Monte Carlo

参考文献

[1] Zeng Y, Zhang R, Lim TJ. Wireless communications with unmanned aerial vehicles: Opportunities and challenges. IEEE Communications Magazine, 2016, 54(5): 36–42. [doi: 10.1109/MCOM.2016.7470933

[2] Pham HX, La HM, Feil-Seifer D, et al. Reinforcement learning for autonomous UAV navigation using function approximation. 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Philadelphia: IEEE, 2018. 1–6.

[3] Chowdhury MU, Bulut E, Guvenc I. Trajectory optimization in UAV-assisted cellular networks under mission duration constraint. 2019 IEEE Radio and Wireless Symposium (RWS). Orlando: IEEE, 2019. 1–4.

[4] Chowdhury MU, Erden F, Guvenc I. RSS-based Q-learning for indoor UAV navigation. 2019 IEEE Military Communications Conference (MILCOM). Norfolk: IEEE, 2019. 121–126.

[5] Ezuma M, Erden F, Anjinappa CK, et al. Detection and classification of UAVs using RF fingerprints in the presence of Wi-Fi and Bluetooth interference. IEEE Open Journal of the Communications Society, 2019, 1: 60–76

[6] Zhang YM, Mehrjerdi H. A survey on multiple unmanned vehicles formation control and coordination: Normal and fault situations. 2013 International Conference on Unmanned Aircraft Systems (ICUAS). Atlanta: IEEE, 2013. 1087–1096.

[7] Bertuccelli LF, How JP. Search for dynamic targets with uncertain probability maps. 2006 American Control Conference. Minneapolis: IEEE, 2006. 6.

[8] Bourgault F, Furukawa T, Durrant-Whyte HF. Decentralized Bayesian negotiation for cooperative search. Proceedings 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566). Sendai: IEEE, 2004. 2681–2686.

[9] 沈延航, 周洲, 祝小平. 基于搜索理论的多无人机协同控制方法研究. 西北工业大学学报, 2006, 24(3): 367–370.

[10] Imanberdiyev N, Fu CH, Kayacan E, et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV). Phuket: IEEE, 2016. 1–6.

[11] Gandhi D, Pinto L, Gupta A. Learning to fly by crashing. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 3948–3955.

[12] Venturini F, Mason F, Pase F, et al. Distributed reinforcement learning for flexible UAV swarm control with transfer learning capabilities. Proceedings of the 6th ACM Workshop on Micro Aerial Vehicle Networks, Systems, and Applications. Ontario: ACM, 2020. 10.

[13] Liu CH, Dai ZP, Zhao YN, et al. Distributed and energy-efficient mobile crowdsensing with charging stations by deep reinforcement learning. IEEE Transactions on Mobile Computing, 2021, 20(1): 130–146. [doi: 10.1109/TMC.2019.2938509

[14] Zhang Y, Li B, Gao FF, et al. A robust design for ultra reliable ambient backscatter communication systems. IEEE Internet of Things Journal, 2019, 6(5): 8989–8999. [doi: 10.1109/JIOT.2019.2925843

[15] Narayanan S, Renzo MD, Graziosi F, et al. Distributed spatial modulation: A cooperative diversity protocol for half-duplex relay-aided wireless networks. IEEE Transactions on Vehicular Technology, 2016, 65(5): 2947–2964. [doi: 10.1109/TVT.2015.2442754

[16] Sutton RS, Barto AG. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 1998, 9(5): 1054

[17] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533. [doi: 10.1038/nature14236

引用本文

郭天昊,张钢,岳文渊,王倩,郭大波.基于多智能体强化学习的无人机群室内辅助救援.计算机系统应用,2022,31(2):88-95

复制

文章指标

点击次数:919
下载次数: 2154
HTML阅读次数: 2072
引用次数: 0

历史

收稿日期:2021-04-14
最后修改日期:2021-05-11
录用日期:
在线发布日期: 2022-01-28
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码