Abstract:By using resilient distributed dataset,Spark is more adapted to iterative algorithms,which are common in data mining and machine learning jobs.However,the development of Spark applications is complicated for data analysts on account of the high threshold to learn scala,the rich experience of code optimization and system deployment,as well as multiple duplicated work due to the low reusing of code.We design and develop a machine learning tool with visible workflow style based on Spark.We design the stages of machine learning with workflow modules,including data preprocessing,feature processing,model training and validation.Meanwhile,a friendly user interface is brought forward to accelerate the design of machine learning workflow model for analysts,with the support of auto parsing from modules to Spark jobs by server end.This tool can greatly improves the efficiency of machine learning development on Spark platform.We introduce the theoretical methods and critical techniques in the paper,and prove its validity with a real instance.