Abstract:Increase of cyber-attacks is now becoming a serious problem. Among these attacks, malicious URL often plays an import role. It has been widely used to mount various cyber attacks including phishing, spamming, and malware. Detection of malicious URLs is critical to thwart these attacks. Numerous techniques are developed to detect malicious URLs and machine learning techniques have been explored with increasing attention in recent years. However, traditional machine learning methods require tedious work of features preprocessing and it is very time-consuming. In this study, we propose a detection method based solely on lexical features of URLs. First, we obtain the distributed representation of characters in URLs by training a 2-layer Neural Network (NN). Then we train the Convolutional NN (CNN) to classify feature images which are generated by mapping the URL to its distributed representation. In our experience, we obtained a reasonable accuracy of 97.3% and F1 of 91.8% using the real-world data set.