Volume 6 Number 8 (Aug. 2011)
Home > Archive > 2011 > Volume 6 Number 8 (Aug. 2011) >
JSW 2011 Vol.6(8): 1409-1416 ISSN: 1796-217X
doi: 10.4304/jsw.6.8.1409-1416

Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model

Wei Liu, Jianxun Zeng

Institute of Scientific and Technical Information of China China, 100038

Abstract—A huge amount of academic papers(including research reports) are being released in web pages. It is important to extract these papers in a structured way for many popular applications, such as science and technology information retrieval and digital library. However, few investigations have been done on the issue of academic paper extraction. This paper proposed a unified approach for automatically extracting academic papers from web pages based on CRF model. In the proposed approach, both academic paper extraction and semantic labeling are performed simultaneously by employing the theoretical Conditional Random Fields(CRF) model. Experimental results show that our approach can achieve significantly better extraction results.

Index Terms—Web data extraction, Web intelligence, Machine learning, Conditional Random Fields

[PDF]

Cite: Wei Liu, Jianxun Zeng, "Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model," Journal of Software vol. 6, no. 8, pp. 1409-1416, 2011.

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]