Abstract:With the update of intelligent equipment and the improvement of data storage capacity, manufacturing companies have achieved a large amount of pipeline data in the manufacturing process of their products. How to utilize these data has always been a difficult problem in the industry. Depending on the actual production data of manufacturing enterprises, this study establishes a product failure identification model based on FTRL (with Logistic Regression) and XGBoost algorithms through detailed exploratory data analysis, then uses cross-validation methods to optimize it according to MCC metric which is suitable for unbalanced datasets. The experimental results show that the model has a high efficiency and high accuracy of fault prediction for large-scale (not only large sample size but also large feature quantity) unbalanced production pipeline datasets. Based on this model, we can build a smarter product fault detection system, which effectively reduces the operating costs of the enterprise and also spurs profit growth.