Volume 6 Number 12 (Dec. 2011)
Home > Archive > 2011 > Volume 6 Number 12 (Dec. 2011) >
JSW 2011 Vol.6(12): 2421-2425 ISSN: 1796-217X
doi: 10.4304/jsw.6.12.2421-2425

Segmenting Webpage with Gomory-Hu Tree Based Clustering

Xinyue Liu1, 2, Hongfei Lin1, Ye Tian2

1School of Computer Science and Technology, Dalian University of Technology, Dalian, China
2School of Software, Dalian University of Technology, Dalian, China


Abstract—We propose a novel web page segmentation algorithm based on finding the Gomory-Hu tree in a planar graph. The algorithm firstly distills vision and structure information from a web page to construct a weighted undirected graph, whose vertices are the leaf nodes of the DOM tree and the edges represent the visible position relationship between vertices. Then it partitions the graph with the Gomory-Hu tree based clustering algorithm. Experimental results show that, compared with VIPS and Chakrabarti et al.’s graph theoretic algorithm, our algorithm improves upon the other two with much higher precision and recall, and its running time is far lower than that of Chakrabarti et al.’s graph theoretic algorithm.

Index Terms—Webpage segmentation, DOM tree, Gomory- Hu tree, Planar graph

[PDF]

Cite: Xinyue Liu, Hongfei Lin, Ye Tian, "Segmenting Webpage with Gomory-Hu Tree Based Clustering," Journal of Software vol. 6, no. 12, pp. 2421-2425, 2011.

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]