Abstract:By mining the effective information hiding in the massive GPS track data of taxi, it can analyze the characteristics of taxi passengers, the urban traffic manager and the taxi industry manager can make decisions in urban transportation planning, urban traffic flow equilibrium and vehicle scheduling. Based on Spark big data analysis platform, YARN as resource management is chosen and HDFS distributed storage system for taxi GPS trace is used for data mining, a variety of information related to taxi is extracted. The mining algorithm based on Spark platform is given, which includes the distance distribution of taxi passengers travel, the time distribution of taxi usage and the demand of taxi travel. The experimental results show that the proposed method based on Spark platform can quickly and accurately analyze the characteristics of taxi passengers travel.