Abstract:In the research of passive network device identification based on network traffic analysis, much high-dimensional data often appears in the network traffic data, and some of these features do not contribute much to device identification and even can seriously affect the classification results and performance. Therefore, this study proposes a network traffic feature selection algorithm FSSA that combines Filter and Wrapper approaches based on symmetric uncertainty (SU) and approximate Markov blanket (AMB). Specifically, the proposed method in this study first uses the SU algorithm to select the features with classification contributions for each category and remove irrelevant feature attributes. Then, the AMB algorithm is adopted to delete redundant features in the subset of candidate features. Finally, the Wrapper approach based on the C4.5 classification algorithm is employed to determine the final feature preference. The experimental results show that the accuracy of the features selected under this method for type identification of the network device operating system has been improved compared with classical feature selection methods, and the recall rate on small class data has also been raised.