Abstract:Macro malware is widely used in advanced persistent threats. Macro obfuscation is low-cost and flexible, rendering traditional rule-based anti-malware systems insufficient. A gradient-boosting-decision-tree-based approach to detecting obfuscated macro malware is proposed. The approach performs large-scale feature engineering guided by the expertise of malware specialists, with fine-grained modeling for obfuscated macro malware carried out on top of lexical analysis, and massive samples are used to train the model. Experimental results show that the approach is able to precisely detect real-world obfuscated macro malware found in the network of enterprise customers, as well as those variants generated by mainstream obfuscation tools; 10-fold cross validation is carried out for a total of 4000 000 macro programs, giving a precision of 99.41% and a recall of 97.34%, which outperforms existing works.