Abstract:Diabetes is becoming a more and more serious health challenge worldwide with the yearly rising prevalence, especially in developing countries, where the vast majority of diabetes are type 2 diabetes. Scientific research has proved that about 80% of type 2 diabetes complications can be prevented or delayed by timely detection. In this study, we propose an ensemble model to precisely diagnose the diabetes in a large-scale and imbalance dataset. The dataset used in our work covers millions of people from one province in China ranging from 2009 to 2015, which is highly skew. Results on the real-world dataset prove that our method is promising for diabetes diagnosis with a high sensitivity, F3 and G-mean, i.e., 91.00%, 58.24%, 86.69%, respectively.