Abstract:In order to release the burden of large matrix multiplication on a single node, a distributed matrix multiplication algorithm based on Hadoop is proposed. The algorithm uses the input file with binary format, applys optimal split, removes the unnecessary Reduce phase. The algorithm can greatly reduce the amount of input data, has simple algorithm flow and good scalability. Experiment results demonstrate that the algorithm greatly reduces the computation of matrix multiplication and achieves good speedup when the amount of input data is around the optimum loading of the cluster.