doi: 10.4304/jsw.9.10.2749-2757
An Improved LDA Model for Academic Document Analysis
Abstract—Electronic documents on the Internet are always generated with many kinds of side information. Although those massive kinds of information make the analysis become very difficult, models would fit and analyze data well if they could make full use of those kinds of side information. This paper, base on the study on probabilistic topic model, proposes a new improved LDA model which is suitable for analysis of academic document. Based on the modification of standard LDA model, this new improved LDA model could analyze documents with both authors and references. To evaluate the generalization capability, this paper compares the new model with standard LDA and DMR model using the widely used Rexa dataset. Experimental results show that the new model has a high capability of document clustering and topics extraction than standard LDA and its modifications. In addition, the new model outperforms DMR model in task of authors discriminant.
Index Terms—academic documents; topic model; topics extraction; authors discriminant
Cite: Yuyan Jiang, Yuan Shao, "An Improved LDA Model for Academic Document Analysis," Journal of Software vol. 9, no. 10, pp. 2749-2757, 2014.
General Information
ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Quarterly
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, EBSCO,
CNKI, Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etcE-mail: jsweditorialoffice@gmail.com
-
Jun 12, 2024 News!
Vol 19, No 2 has been published with online version [Click]
-
Jan 04, 2024 News!
JSW will adopt Article-by-Article Work Flow
-
Apr 01, 2024 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Apr 01, 2024 News!
Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
-
Mar 01, 2024 News!
Vol 19, No 1 has been published with online version [Click]