JSW 2013 Vol.8(1): 55-62 ISSN: 1796-217X
doi: 10.4304/jsw.8.1.55-62
doi: 10.4304/jsw.8.1.55-62
A Structured Information Extraction Algorithm for Scientific Papers based on Feature Rules Learning
Jianguo Chen1, 2, Hao Chen2
1Fujian University of Technology /Fujian, Fuzhou, China
2Software School, Hunan University /Hunan, Changsha, China
Abstract—Traditional scientific papers are unstructured documents, which are difficult to meet the requirement of structured retrieval, statistical classification and association analysis and other high-level application. Hence, how to extract and analyze the structured information of the papers becomes a challenging problem. A structured information extraction algorithm is proposed for unstructured and/or semi-structured machine-readable documents. With extracted rules after feature learning on the basis of analyzing the basic structure and format features of traditional scientific papers, the proposed scheme extracts the title, author, abstract, keywords, text and other elements of paper from the unstructured documents. Then the proposed scheme exports the structured text from the traditional scientific papers with the format required by multi-dimensional scientific papers, which can meet the requirements of structured retrieval, statistical classification and other high-level applications of scientific papers.
Index Terms—Information Extraction, Feature Rules, Multi-dimensional scientific Papers.
2Software School, Hunan University /Hunan, Changsha, China
Abstract—Traditional scientific papers are unstructured documents, which are difficult to meet the requirement of structured retrieval, statistical classification and association analysis and other high-level application. Hence, how to extract and analyze the structured information of the papers becomes a challenging problem. A structured information extraction algorithm is proposed for unstructured and/or semi-structured machine-readable documents. With extracted rules after feature learning on the basis of analyzing the basic structure and format features of traditional scientific papers, the proposed scheme extracts the title, author, abstract, keywords, text and other elements of paper from the unstructured documents. Then the proposed scheme exports the structured text from the traditional scientific papers with the format required by multi-dimensional scientific papers, which can meet the requirements of structured retrieval, statistical classification and other high-level applications of scientific papers.
Index Terms—Information Extraction, Feature Rules, Multi-dimensional scientific Papers.
Cite: Jianguo Chen, Hao Chen, "A Structured Information Extraction Algorithm for Scientific Papers based on Feature Rules Learning," Journal of Software vol. 8, no. 1, pp. 55-62, 2013.
General Information
ISSN: 1796-217X (Online)
Frequency: Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKI, Google Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
-
Apr 26, 2021 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
Papers published in JSW Vol 16, No 1- Vol 16, No 6 have been indexed by DBLP [Click]
-
Dec 24, 2021 News!
Vol 15, No 1- Vol 15, No 6 has been indexed by IET-(Inspec) [Click]
-
Nov 18, 2021 News!
[CFP] 2022 the annual meeting of JSW Editorial Board, ICCSM 2022, will be held in Rome, Italy, July 21-23, 2022 [Click]
-
Aug 01, 2023 News!