Abstract:The iterative computation is an important big data analysis application. While implementing iterative computation on the distributed computation framework MapReduce, the iterative program will be divided into more than one jobs which run in the order defined by the dependencies between jobs, which lead to many interactions between the program and distributed file system(DFS) that will affect the program's execution time. Caching these interaction-related data will reduce the time of interactions between the program and DFS and hence improve the overall performance of application. Considering that large amount of memory in cluster nodes is unused at most time, this paper proposes a programming framework called MemLoop using memory cache for iterative application. This system sufficiently uses the free memory in the cluster's nodes to cache data by implementing the memory caching management from three models: job submit API, task scheduling algorithm, cache management. The cached data is classified into two categories: inter-iteration resident data and intra-iteration dependent data. We compare this framework with previous related framework. The result shows that MemLoop can improve the performance of iterative program.