Abstract:Malicious Web pages is a new kind of Web-based attack method. In drive-by-download exploits, attackers embed malicious code into a Web page. When a victim visits this page, the code attempts to download and execute malwares by exploiting vulnerabilities in browser or its plugins. Considering the problem of extracting static feature from malicious Web page, this paper selects 14 static features based on information gain theory and proposes 8 new static features are proposed by analyzing obfuscated scripts. In addition, two improvements of original feature extraction process are proposed as follows: preprocessing for original Web page based on different code format; reprocessing HTML code which are dynamically generated by JavaScript to further extract HTML features. The experimental result shows that, on unbalanced data set and balanced data set, our static feature system is provided with a certain validity.