Volume 4 Number 5 (Jul. 2009)
Home > Archive > 2009 > Volume 4 Number 5 (Jul. 2009) >
JSW 2009 Vol.4(5): 436-443 ISSN: 1796-217X
doi: 10.4304//jsw.4.5.436-443

Semantic Focused Crawling for Retrieving ECommerce Information

Wei Huang1, 2, Liyi Zhang1, Jidong Zhang2 and Mingzhu Zhu1

1School of Information Management, Wuhan University, Wuhan , P.R. China
2School of Management, Hubei University of Technology, Wuhan , P.R. China

Abstract—Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics without downloading all pages of the Web. With the rapid growth of the E-commerce, how to discovery the specific information such as about buyer, seller and products etc. adapting for the online business user becomes a focused issue to the information search engine. We present a novel semantic approach for building an intelligent focused crawler which deals with evaluating the page’s content relevance to the E-commerce topic by the domain ontology and the hyperlinks connection to the commercial web pages by link analysis. In the process of crawling, the domain ontology can evolve automatically by machine learning based on the statistics and rules. Experiments have been performed, and the results show that our approach is more effective than the other traditional crawling algorithms, and prevents the topic-drift with higher harvest rate.

Index Terms—Focused crawling, Information retrieval, Ecommerce, Semantic, Machine learning

[PDF]

Cite: Wei Huang, Liyi Zhang, Jidong Zhang, and Mingzhu Zhu, "Semantic Focused Crawling for Retrieving ECommerce Information," Journal of Software vol. 4, no. 5, pp. 436-443, 2009.

General Information

  • ISSN: 1796-217X (Online)

  • Abbreviated Title: J. Softw.

  • Frequency:  Quarterly

  • APC: 500USD

  • DOI: 10.17706/JSW

  • Editor-in-Chief: Prof. Antanas Verikas

  • Executive Editor: Ms. Cecilia Xie

  • Abstracting/ Indexing: DBLP, EBSCO,
           CNKIGoogle Scholar, ProQuest,
           INSPEC(IET), ULRICH's Periodicals
           Directory, WorldCat, etc

  • E-mail: jsweditorialoffice@gmail.com

  • Jun 12, 2024 News!

    Vol 19, No 2 has been published with online version   [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]