Topic Mining based on Word Posterior Probability in Spoken Document

JSW 2011 Vol.6(11): 2292-2299 ISSN: 1796-217X
doi: 10.4304/jsw.6.11.2292-2299

Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Jing-xin Chang

Information and Communication Engineering College, Harbin Engineering University, Harbin, China

Abstract—For speech recognition system, there are three kinds of result representations as one-best, N-best and Lattice. Since lattice has multi-path which can reduce the effect of recognition error rate, it is widely applied nowadays. In fact, there are amount of redundancies in lattice, which leads to the increasing of complexity of latter algorithm based on it. Additionally, for the decoding algorithm, it is acted as maximum a posterior probability (MAP) which can only guarantee the posterior probability of the whole sentence is of maximum. For MAP does not mean the highest syllable recognition rate, here, confusion network is introduced in topic mining system. In the clustering during confusion network, the minimum word error rule is adopted, which is proper to topic mining system since the least meaningful unit is word in Chinese and word information is most important in topic mining. In this paper, a simplified confusion network generation algorithm is proposed to handle some problems caused by insertion error during recognition. Then based on the confusion network, a word list extraction approach is proposed, in which, the dictionary is adopted to judge whether the consecutive arc in confusion sets is a word. At this stage, the error word information produced by error recognition rate can be corrected to some extent. After the competition part in word list extraction on confusion network, a final word list with posterior probability can be obtained. Furthermore, this kind of posterior probability can be combined in topic mining system. SVD and NMF are adopted here to decompose the term-document matrix on the word list of confusion network. From the experiments, it can be drawn that the proposed approach based on confusion network can achieve better performance than that of one-best and N-best. Additionally, the modified weight which combined posterior probability into term-document matrix can further improve the system performance.

Index Terms—topic mining, spoken document, posterior probability, confusion network, modified weight

[PDF]

Cite: Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Jing-xin Chang, "Topic Mining based on Word Posterior Probability in Spoken Document," Journal of Software vol. 6, no. 11, pp. 2292-2299, 2011.

PREVIOUS PAPER

A Resource Management Methodology for Collaborative Computing System over Multiple Virtual Machines

NEXT PAPER

Decision Support System of Regional Water Resources

General Information

ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Biannually
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, CNKI,
Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com

What's New

Mar 30, 2026 News!

Vol 20, No 2 has been published with online version [Click]
Mar 07, 2025 News!

JSW had implemented online submission system [Click]
Apr 01, 2024 News!

Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) 　 [Click]
Apr 01, 2024 News!

Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
Mar 07, 2025 News!

Vol 19, No 4 has been published with online version [Click]

Volume 6 Number 11 (Nov. 2011)

Home > Archive > 2011 > Volume 6 Number 11 (Nov. 2011) >

Topic Mining based on Word Posterior Probability in Spoken Document

General Information