Abstract:This paper starts from the perspective of protein sequence data, and constructs the protein network by cyclic sequence similarity matching. Then a novel method based on ranking the importance of network nodes is proposed. Considering the importance of protein nodes in the network, the node importance algorithm PageRank (PR) is used to compute the nodes' PR value. The proposed method is also developed on the Hadoop Platform, which makes it more suitable for huge genome database with great efficiency and parallel computing. Finally, comparing the traditional method of function prediction by the Accurate rate, Recall rate and F1-measure measurements, our method has been validated and the result shows that the method is feasible and valuable for practical usage.