Hardware-software Co-design and Implementation for Large Scale Hash Tables
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Hash tables play an important role in network message processing, especially in the processing of messages with states. With the rapid growth of network traffic, the hash tables of traditional software can hardly meet the needs of network performance, and search is one of the key factors affecting the performance of hash tables. In addition, the improvement in the search rate of hash tables has always been a difficult problem. The research reveals that the existing network traffic presents the characteristics of Pareto distribution, namely that there is a small number of massive traffic data—elephant flow. On the basis of the computing mode of software-hardware co-design used in the current data center, a large-scale hash table architecture with software-hardware co-design is proposed on the basis of DPDK+FPGA. According to the characteristics of existing network traffic, this method divides the traffic into elephant flow and background flow, and meanwhile, the hash table is divided into a hardware table and a software table. A small-scale hardware table is constructed in FPGA to unload the hash calculation of all messages and the hash search of elephant flow. In the software, a large-scale software table is constructed on the basis of DPDK, and the hash calculation is unloaded by FPGA to speed up the search of background flow. As the software has all the flow information, the sampling method is used to identify the elephant flow and update the key-value pair of the elephant flow to the hardware table of FPGA, so as to accelerate the search rate of the large-scale software table in the software. The Xilinx U200 accelerator card and general server are employed as the hardware platform to realize the large-scale hash table with software-hardware co-design, and the traffic data in line with the current network characteristics is constructed by the tester. The accurate forwarding of DPDK is used as an example to verify the performance of the hash table with hardware-software co-design. The results reveal that when the hash search of elephant flow is completely unloaded, its performance is 64%–75% higher than the original accurate forwarding of DPDK; when the elephant flow is not unloaded, its performance is improved by 5%–48%.

    Reference
    [1] Colbourn CJ, Ling ACH. Linear hash families and forbidden configurations. Designs, Codes and Cryptography, 2009, 52(1):25-55.[doi:10.1007/s10623-008-9266-7
    [2] Panigrahy R. Efficient hashing with lookups in two memory accesses. Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. Vancouver:Society for Industrial and Applied Mathematics, 2005. 830-839.
    [3] Le Scouarnec N. Cuckoo++ hash tables:High-performance hash tables for networking applications. Proceedings of the 2018 Symposium on Architectures for Networking and Communications Systems. Ithaca:ACM, 2018. 41-54.
    [4] Tian H, Chen YX, Chang CC, et al. Dynamic-hash-table based public auditing for secure cloud storage. IEEE Transactions on Services Computing, 2017, 10(5):701-714.[doi:10.1109/TSC.2015.2512589
    [5] Devroye L, Morin P. Cuckoo hashing:Further analysis. Information Processing Letters, 2003, 86(4):215-219.[doi:10.1016/S0020-0190(02)00500-8
    [6] Kelly R, Pearlmutter BA, Maguire P. Lock-free hopscotch hashing. Proceedings of the 1st Symposium on Algorithmic Principles of Computer Systems. Salt Lake City:Society for Industrial and Applied Mathematics, 2020. 45-59.
    [7] Ahmed DT, Shirmohammadi S. Multi-level hashing for peer-to-peer system in wireless ad hoc environment. Proceedings of the 5th Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PerComW 2007). White Plains:IEEE, 2007. 126-131.
    [8] Kumar S, Crowley P. Segmented hash. Proceedings of the 2005 Symposium on Architectures for Networking and Communications Systems (ANCS). Princeton:IEEE, 2005. 91-103.
    [9] Kirsch A, Mitzenmacher M. The power of one move:Hashing schemes for hardware. IEEE/ACM Transactions on Networking, 2010, 18(6):1752-1765.[doi:10.1109/TNET.2010.2047868
    [10] Yang X, Sezer S, McCanny J, et al. DDR3 based lookup circuit for high-performance network processing. Proceedings of the 2009 IEEE International SOC Conference. Belfast:IEEE, 2009. 351-354.
    [11] István Z, Alonso G, Blott M, et al. A hash table for line-rate data processing. ACM Transactions on Reconfigurable Technology and Systems, 2015, 8(2):13
    [12] Chen H, Madaminov S, Ferdman M, et al. FPGA-accelerated samplesort for large data sets. Proceedings of the 2020 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. Seaside:ACM, 2020. 222-232.
    [13] Ioannou L, Michail HE, Voyiatzis AG. High performance pipelined FPGA implementation of the SHA-3 hash algorithm. Proceedings of the 2015 4th Mediterranean Conference on Embedded Computing. Budva:IEEE, 2015. 68-71.
    [14] Panait O, Dumitriu L, Susnea I. Hardware and software architecture for accelerating hash functions based on SoC. Proceedings of the 2019 22nd International Conference on Control Systems and Computer Science (CSCS). Bucharest:IEEE, 2019. 136-139.
    [15] Fairouz A, Khatri SP. An FPGA-based coprocessor for hash unit acceleration. Proceedings of the 2017 IEEE International Conference on Computer Design. Boston:IEEE, 2017. 301-304.
    [16] DPDK. DPDK 21.08 Intel NIC performance report. https://core.dpdk.org/perf-reports/. (2021-10-19)[2022-04-10].
    [17] Shi J, Pesavento D, Benmohamed L. NDN-DPDK:NDN forwarding at 100 Gbps on commodity hardware. Proceedings of the 7th ACM Conference on Information-Centric Networking. Montreal:ACM, 2020. 30-40.
    [18] Sasaki K, Hirofuchi T, Yamaguchi S, et al. An accurate packet loss emulation on a DPDK-based network emulator. Proceedings of the Asian Internet Engineering Conference. Phuket:ACM, 2019. 1-8.
    [19] Furukawa M, Matsutani H. A DPDK-based acceleration method for experience sampling of distributed reinforcement learning. arXiv:2110.13506, 2021.
    [20] Hamdan M, Mohammed B, Humayun U, et al. Flow-aware elephant flow detection for software-defined networks. IEEE Access, 2020, 8:72585-72597.[doi:10.1109/ACCESS.2020.2987977
    [21] 刘朝晖, 窦晓光. 基于FPGA实现的报文分类智能网卡. 信息安全与技术, 2013, 4(6):62-65
    [22] 陈宝远, 张秀芝, 梁状. 基于FPGA的高效数据过滤技术研究. 微电子学与计算机, 2017, 34(12):130-133, 137.[doi:10.19304/j.cnki.issn1000-7180.2017.12.027
    [23] Ben Basat R, Einziger G, Friedman R. Space efficient elephant flow detection. Proceedings of the 11th ACM International Systems and Storage Conference. Haifa:ACM, 2018. 115.
    [24] 严军荣, 叶景畅, 潘鹏. 一种大象流两级识别方法. 电信科学, 2017, 33(3):36-43
    [25] Estan C, Varghese G. New directions in traffic measurement and accounting. ACM SIGCOMM Computer Communication Review, 2002, 32(1):75.[doi:10.1145/510726.510749
    [26] Qi JH, Li WJ, Yang T, et al. Cuckoo counter:A novel framework for accurate per-flow frequency estimation in network measurement. Proceedings of the 2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). Cambridge:IEEE, 2019. 1-7.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

杨文韬,张士军,张进,唐寅,于洪涛.大规模软硬协同哈希表设计与实现.计算机系统应用,2023,32(1):61-74

Copy
Share
Article Metrics
  • Abstract:821
  • PDF: 2432
  • HTML: 1767
  • Cited by: 0
History
  • Received:February 21,2022
  • Revised:March 23,2022
  • Online: October 28,2022
Article QR Code
You are the first990414Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063