Abstract:With the continuous improvement of people's living standards, the number of cancer diseases is increasing. Among them, lung cancer is a major disease that seriously endangers human health in the 21st century. This paper presents a decision tree method for lung cancer diagnosis based on electronic medical records. Firstly, the characteristics of lung cancer electronic medical records and the instability and over-fitting of the model tree in the decision tree are analyzed. The optimal decision tree model constructed by principal component analysis combined with C5.0 algorithm is used. Firstly, two methods of feature dimension reduction with principal component eigenvalue greater than 1 and principal component cumulative contribution rate greater than 85% are established. Then, the decision tree model and pruning operation are established by C5.0 algorithm. Finally, the data preprocessing process and model are given. The experimental results show that the improved algorithm has better accuracy and good scalability, which proves that the improved algorithm is of great significance for the clinical trial of lung cancer.