Abstract:Mini-programs have been widely used in recent years, causing widespread privacy and security concerns for carrying a large amount of sensitive user data. Existing privacy and security analysis techniques for traditional mobile applications cannot be directly applied to mini-programs. On the one hand, it is difficult for existing methods to effectively analyze the privacy transfer caused by the closed-source mini-program framework and the cross-scope privacy transfer caused by the JavaScript closures, resulting in a lack of analysis results. On the other hand, the mechanism of dynamic sub-package loading leads to incomplete analysis scope, further resulting in a lack of analysis results. This study proposes a hybrid dynamic/static method for analyzing the privacy collection behaviors in mini-programs. First, this method constructs a data propagation path based on either control flow or data dependency for different unit boundaries in the mini-programs, namely the mini-program privacy propagation flow graph. Furthermore, this method effectively explores the mini-program UI by learning and transferring traditional mobile application UI design knowledge, and using the control flow association between UI events and page transition information as a guide, thereby triggering the sub-package loading process. The corresponding sub-package code is analyzed and integrated with existing analysis results to form a more comprehensive mini-program privacy propagation flow graph. This study implements the tracking of sensitive data in mini-programs through the privacy propagation flow graph. Based on the above method, this study implements MiniSafe, a privacy collection behavior analysis tool for mini-programs. The evaluation results show that MiniSafe achieves 90.4% and 87.4% in precision and recall respectively, both of which outperform existing work. MiniSafe detects an average of 7 sensitive data collection behaviors in each mini-program. By considering sensitive data collection behaviors in mini-program sub-packages, the overall detection number has increased by 42.9%, demonstrating good detection performance and practical usability.