大规模时空轨迹数据连接查询效率优化实践
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

公安部应用创新计划(2019YYCXYNST019)


Practice of Improving Join Query Efficiency for Large Scale Spatiotemporal Trajectory Data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文提出一种低集群计算资源条件下, 大规模轨迹类数据同时空关系的快速连接查询算法DPCP-CROSS-JOIN. 该算法通过对轨迹数据时间字段进行分段交叉编码和位置网格化等方式对连续的轨迹数据离散化, 并以日期和网格区域编码进行两级分区存储. 通过交叉“等值”连接查询, 实现时空连接查询的三级索引、四级加速, 将$n\cdot n $对象间同时空关系连接查询时间复杂度从O(n2)降为O(nlogn). 在Hadoop集群上使用Hive和TEZ等进行大规模轨迹数据连接查询时能将连接查询效率最高提升到30.66倍. 该算法以时间段编码作为关联条件, 巧妙绕开连接过程中复杂表达式的实时计算, 以“等值”替代复杂表达式计算连接, 提高MapReduce任务并行度, 提升集群存储和计算资源利用率. 在面对仅使用一般优化已几乎无法完成的, 更大规模类似任务, 仍能在数分钟内完成. 实验表明, 该算法具有高效和稳定等特性, 尤其适用低“算力”资源条件下大规模轨迹数据的同时空关系连接查询. 此方法还可作为时空轨迹伴随查找, 对象间关系亲密度判定等的原子算法, 可广泛应用于维护国家安全、社会治安秩序, 预防和打击犯罪, 辅助城乡规划统筹等领域.

    Abstract:

    This study proposes an algorithm named DPCP-CROSS-JOIN for fast co-spatiotemporal relationship join queries of large-scale trajectory data in insufficient cluster computing resource environments. The proposed algorithm discretizes continuous trajectory data by segmenting and cross-coding the temporal fields of trajectory data and conducting spatiality gridded coding and then stores the data in two-level partitions using date and grid region coding. It achieves 3-level indexing and 4-level acceleration for spatiotemporal join queries through cross “equivalent” join queries. As a result, the time complexity of the co-spatiotemporal relationship join queries among n$\cdot $n objects is reduced from O(n2) to O(nlogn). It can improve the efficiency of join queries by up to 30.66 times when Hive and TEZ are used on a Hadoop cluster for join queries of large-scale trajectory data. This algorithm uses time-slice and gridding coding as the join condition, thereby cleverly bypassing the real-time calculation of complex expressions during the join process. Moreover, complex expression calculation join is replaced with “equivalent” join to improve the parallelism of MapReduce tasks and enhance the utilization rates of cluster storage and computing resources. Similar tasks of larger scales of trajectory data that are almost impossible to accomplish using general optimization methods can still be completed by the proposed algorithm within a few minutes. The experimental results suggest that the proposed algorithm is efficient and stable, and it is especially suitable for the co-spatiotemporal relationship join queries of large-scale trajectory data under insufficient computing resource conditions. It can also be used as an atomic algorithm for searching accompanying spatiotemporal trajectories and determining the intimacy of relationships among objects. It can be widely applied in fields such as national security and social order maintenance, crime prevention and combat, and urban and rural planning support.

    参考文献
    相似文献
    引证文献
引用本文

丁强龙,叶惠珠,袁弘强,李志新.大规模时空轨迹数据连接查询效率优化实践.计算机系统应用,2024,33(5):1-14

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-11-15
  • 最后修改日期:2023-12-20
  • 录用日期:
  • 在线发布日期: 2024-04-07
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号