Abstract:Graph partitioning algorithm is part of distributed graph computing system. It divides a graph into several subgraphs to run in the distributed system and assigns the vertex and edge data and computing tasks on the subgraphs to each partition. Heterogeneous graph is a kind of graph widely existing in the real world. It is a graph with multiple vertex types or edge types. During the calculation of heterogeneous graphs, the existing graph partition algorithms do not consider the following problems. In graph calculation, different vertex and edge types may carry different data amounts. Meanwhile, different vertex and edge types may adopt different processing algorithms, with various calculation time. Aiming at the shortcomings of the existing graph partitioning methods, this study proposes an online graph partitioning algorithm for heterogeneous graphs, OGP-HG algorithm. Additionally, the existing GraphX graph computing engine is improved to implement the proposed algorithm in the improved graph computing engine. The proposed OGP-HG algorithm calculates the load balance score of vertices divided into different partitions and the data balance score of edges divided into different partitions, thus obtaining the division results of balancing the load and memory occupation of heterogeneous graphs. Experiments show that compared with traditional graph partitioning algorithms, this algorithm improves the computing efficiency of heterogeneous graphs by 1.05–1.4 times.