University of Chinese Academy of Sciences, Beijing 100049, China;Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China 在期刊界中查找 在百度中查找 在本站中查找
Spark is widely used as a computing platform for large data processing, reasonable allocation of cluster resources plays an important role in the operation of Spark performance optimization. The performance prediction is the basis and key of cluster resource allocation optimization, thus we put forward a Spark performance prediction model in this paper. This paper selects the job execution time as a measure indicator of Spark performance, and put forward the concept of key Stage of Spark job. Finally, we built the model by analyzing relationships between the key Stages and the amount of input data through running a small quantity of data. The experimental results show that the model is effective