Volume 9 Number 12 (Dec. 2014)
Home > Archive > 2014 > Volume 9 Number 12 (Dec. 2014) >
JSW 2014 Vol.9(12): 3057-3062 ISSN: 1796-217X
doi: 10.4304/jsw.9.12.3057-3062

Research on Deep Web Query Interface Clustering Based on Hadoop

Baohua Qiang1, Rui Zhang2, Yufeng Wang3, Qian He1, Wei Li1, and Sai Wang1

1Guangxi Key Lab of Trusted Software, Guilin University of Electronic Technology, Guilin 541000, China
2North China University of Water Resources and Electric Power, Zhengzhou 450045, China
3The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang 050000, China

Abstract—How to cluster different query interfaces effectively is one of the most core issues when generating integrated query interface on Deep Web integration domain. However, with the rapid development of Internet technology, the number of Deep Web query interface shows an explosive growth trend. For this reason, the traditional stand-alone Deep Web query interface clustering approaches encounter bottlenecks in terms of time complexity and space complexity. After further study of the Hadoop distributed platforms and Map Reduce programming model, a Deep Web query interface clustering algorithm based on Hadoop platform is designed and implemented, in which the Vector Space Model (VSM) and Latent Semantic Analysis (LSA) are employed to represent “Query Interfaces-Attributes” relationships. The experimental results show that the proposed algorithm has better scalability and speedup ratio by using Hadoop architecture.

Index Terms—Hadoop, Map Reduce, Deep Web, LSA, Query Interface Clustering


Cite: Baohua Qiang, Rui Zhang, Yufeng Wang, Qian He, Wei Li, and Sai Wang, "Research on Deep Web Query Interface Clustering Based on Hadoop," Journal of Software vol. 9, no. 12, pp. 3057-3062, 2014.

General Information

  • ISSN: 1796-217X (Online)
  • Frequency:  Quarterly
  • Editor-in-Chief: Prof. Antanas Verikas
  • Executive Editor: Ms. Yoyo Y. Zhou
  • Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
  • E-mail: jsweditorialoffice@gmail.com
  • APC: 500USD
  • Jun 12, 2024 News!

    Vol 19, No 2 has been published with online version   [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]