Abstract:With the rapid development of e-commerce platforms, the logistics industry is at a high rate of growth. The access logs of the logistics service platform can reflect user behavior, so it is very important to tap the hidden information to help the logistics service platform optimize the business. At present, higher real-time requirements are imposed on large-scale log data processing. This study comprehensively considers a variety of stream processing frameworks capable of real-time computing, large-scale storage databases, log collection tools, etc. It chooses Flume and Kafka as the log collection tools and message queues and uses Flink and HBase for real-time calculation of streaming data and large-scale data storage. At the same time, the functions including data deduplication, abnormal alarms, fault tolerance strategy, and load scheduling are designed for the platform. Experimental tests have proved that this processing platform can efficiently process log data of the logistics service platform, with innovative ideas and practical value.