Abstract:An improved random forest node splitting algorithm is proposed in this study for improving the accuracy of image classification. The independent splitting method ID3 and CART are re-combined, and new splitting rules are obtained by adaptive parameter selection. On the basis of the bag-of-words model, the spatial pyramid model is introduced to extract image features. After dividing the image into different grids, k-means algorithm is then used to character clustering. Finally, it uses the algorithm for verification on a large number of images on Spark. The results show that the algorithm can be applied to distributed systems, and can greatly improve the classification accuracy while ensuring the efficiency of the algorithm at the same time.