Volume 6 Number 11 (Nov. 2011)
Home > Archive > 2011 > Volume 6 Number 11 (Nov. 2011) >
JSW 2011 Vol.6(11): 2292-2299 ISSN: 1796-217X
doi: 10.4304/jsw.6.11.2292-2299

Topic Mining based on Word Posterior Probability in Spoken Document

Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Jing-xin Chang
Information and Communication Engineering College, Harbin Engineering University, Harbin, China

Abstract—For speech recognition system, there are three kinds of result representations as one-best, N-best and Lattice. Since lattice has multi-path which can reduce the effect of recognition error rate, it is widely applied nowadays. In fact, there are amount of redundancies in lattice, which leads to the increasing of complexity of latter algorithm based on it. Additionally, for the decoding algorithm, it is acted as maximum a posterior probability (MAP) which can only guarantee the posterior probability of the whole sentence is of maximum. For MAP does not mean the highest syllable recognition rate, here, confusion network is introduced in topic mining system. In the clustering during confusion network, the minimum word error rule is adopted, which is proper to topic mining system since the least meaningful unit is word in Chinese and word information is most important in topic mining. In this paper, a simplified confusion network generation algorithm is proposed to handle some problems caused by insertion error during recognition. Then based on the confusion network, a word list extraction approach is proposed, in which, the dictionary is adopted to judge whether the consecutive arc in confusion sets is a word. At this stage, the error word information produced by error recognition rate can be corrected to some extent. After the competition part in word list extraction on confusion network, a final word list with posterior probability can be obtained. Furthermore, this kind of posterior probability can be combined in topic mining system. SVD and NMF are adopted here to decompose the term-document matrix on the word list of confusion network. From the experiments, it can be drawn that the proposed approach based on confusion network can achieve better performance than that of one-best and N-best. Additionally, the modified weight which combined posterior probability into term-document matrix can further improve the system performance.

Index Terms—topic mining, spoken document, posterior probability, confusion network, modified weight

[PDF]

Cite: Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Jing-xin Chang, "Topic Mining based on Word Posterior Probability in Spoken Document," Journal of Software vol. 6, no. 11, pp. 2292-2299, 2011.

General Information

ISSN: 1796-217X (Online)
Frequency: Monthly (2006-2019); Bimonthly (Since 2020)
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, Google Scholar, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
  • Dec 06, 2019 News!

    Vol 14, No 1- Vol 14, No 4 has been indexed by EI (Inspec)   [Click]

  • Jun 22, 2020 News!

    Papers published in JSW Vol 14, No 1- Vol 15 No 4 have been indexed by DBLP     [Click]

  • Jun 22, 2020 News!

    The papers published in Vol 15, No 5 have all received dois from Crossref    [Click]

  • Aug 01, 2018 News!

    [CFP] 2020 the annual meeting of JSW Editorial Board, ICCSM 2020, will be held in Rome, Italy, July 17-19, 2020   [Click]

  • Jun 22, 2020 News!

    Vol 15, No 5 has been published with online version     [Click]