This study proposes a model called E2E-DRNet to address issues in manual diabetic retinopathy (DR) diagnosis, including poor classification performance, laborious processes, minimal differences in grades of retinal images, and inconspicuous lesions. This model is based on EfficientNetV2 and incorporates the efficient channel attention (ECA) module. By processing and optimizing a DR dataset, the Focal Loss function is introduced to address sample imbalance. The model achieves refined DR classification through two stages. Experimental results demonstrate that the proposed model performs well on both public and clinical datasets. Additionally, it enhances the interpretability of lesion regions in fundus images, thereby improving the efficiency of DR lesion screening and overcoming the limitations of manual diagnosis.