Abstract:A multidimensional-semantics based Web information extraction method is proposed in this article to extract medicine information on the Web. The method overcomes the heterogeneity of Web pages from different sources and finds the common characteristics among them by building up a semantic dictionary and describes the knowledge of medicine information over the Web. At the same time, it utilizes a structural-semantic-entropy-based approach to detect data-rich sections on Web pages, then extract information of interest from them and finally verify and supplement the extracted information by generating extraction rules using XPath. The method is able to obtain information from heterogeneous sources both automatically and effectively. Experiments shown that it has high precision and recall, thus can provide sufficient information for the government to enhance supervision of medicine market on the Web.