JSW 2011 Vol.6(8): 1409-1416 ISSN: 1796-217X
doi: 10.4304/jsw.6.8.1409-1416
doi: 10.4304/jsw.6.8.1409-1416
Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model
Wei Liu, Jianxun Zeng
Institute of Scientific and Technical Information of China China, 100038
Abstract—A huge amount of academic papers(including research reports) are being released in web pages. It is important to extract these papers in a structured way for many popular applications, such as science and technology information retrieval and digital library. However, few investigations have been done on the issue of academic paper extraction. This paper proposed a unified approach for automatically extracting academic papers from web pages based on CRF model. In the proposed approach, both academic paper extraction and semantic labeling are performed simultaneously by employing the theoretical Conditional Random Fields(CRF) model. Experimental results show that our approach can achieve significantly better extraction results.
Index Terms—Web data extraction, Web intelligence, Machine learning, Conditional Random Fields
Abstract—A huge amount of academic papers(including research reports) are being released in web pages. It is important to extract these papers in a structured way for many popular applications, such as science and technology information retrieval and digital library. However, few investigations have been done on the issue of academic paper extraction. This paper proposed a unified approach for automatically extracting academic papers from web pages based on CRF model. In the proposed approach, both academic paper extraction and semantic labeling are performed simultaneously by employing the theoretical Conditional Random Fields(CRF) model. Experimental results show that our approach can achieve significantly better extraction results.
Index Terms—Web data extraction, Web intelligence, Machine learning, Conditional Random Fields
Cite: Wei Liu, Jianxun Zeng, "Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model," Journal of Software vol. 6, no. 8, pp. 1409-1416, 2011.
General Information
ISSN: 1796-217X (Online)
Frequency: Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKI, Google Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
-
Mar 01, 2024 News!
Vol 19, No 1 has been published with online version [Click]
-
Apr 26, 2021 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP [Click]
-
Jan 04, 2024 News!
JSW will adopt Article-by-Article Work Flow
-
Nov 02, 2023 News!
Vol 18, No 4 has been published with online version [Click]