Abstract:Web user access is almost anonymous access. The main goal of weblog mining is to extract users’ behavior patterns from the Weblogs, and then understand users’ behavior by analyzing the mining results to improve the structure of the site. The first step of weblog mining is data preprocessing. Data preprocessing is the most time consuming stage in web page analysis. This paper first studies the process of data preprocessing, including data cleaning, user identification, session identification, path completion. A path completion algorithm is proposed. The paper poses the hypothesis that the path completion has a significant impact on rule extraction quantity and quality, and then experimental verification is conducted to assess the effect of path completion in weblog mining. The experiment result also provides an experimental basis to what extent data preparation should be carried out.