###
计算机系统应用英文版:2016,25(12):162-168
本文二维码信息
码上扫一扫!
基于Spark的流程化机器学习分析方法
(1.中国科学院大学, 北京 100190;2.中国科学院软件研究所, 北京 100190)
Method of Implement Machine Learning Analysis with Workflow Based on Spark Platform
(1.University of Chinese Academy of Sciences, Beijing 10090, China;2.Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1495次   下载 2696
Received:March 21, 2016    Revised:April 11, 2016
中文摘要: Spark通过使用内存分布数据集,更加适合负载数据挖掘与机器学习等需要大量迭代的工作.但是数据分析师直接使用Spark进行开发十分复杂,包括scala学习门槛高,代码优化与系统部署需要丰富的经验,同时代码的复用度低导致重复工作繁多.本文设计并实现了一种基于Spark的可视化流程式机器学习的方法,一方面设计组件模型来刻画机器学习的基本步骤,包括数据预处理、特征处理、模型训练及验证评估,另一方面提供可视化的流程建模工具,支持分析者设计机器学习流程,由工具自动翻译为Spark平台代码高效执行.本工具可以极大的提高Spark平台机器学习应用开发的效率.论文介绍了工具的方法理论和关键技术,并通过案例表明工具的有效性.
中文关键词: 机器学习  数据分析  分布式  大数据  Spark
Abstract:By using resilient distributed dataset,Spark is more adapted to iterative algorithms,which are common in data mining and machine learning jobs.However,the development of Spark applications is complicated for data analysts on account of the high threshold to learn scala,the rich experience of code optimization and system deployment,as well as multiple duplicated work due to the low reusing of code.We design and develop a machine learning tool with visible workflow style based on Spark.We design the stages of machine learning with workflow modules,including data preprocessing,feature processing,model training and validation.Meanwhile,a friendly user interface is brought forward to accelerate the design of machine learning workflow model for analysts,with the support of auto parsing from modules to Spark jobs by server end.This tool can greatly improves the efficiency of machine learning development on Spark platform.We introduce the theoretical methods and critical techniques in the paper,and prove its validity with a real instance.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(U1435220)
引用文本:
赵玲玲,刘杰,王伟.基于Spark的流程化机器学习分析方法.计算机系统应用,2016,25(12):162-168
ZHAO Ling-Ling,LIU Jie,WANG Wei.Method of Implement Machine Learning Analysis with Workflow Based on Spark Platform.COMPUTER SYSTEMS APPLICATIONS,2016,25(12):162-168