TY - GEN
T1 - Learning Topics Using Semantic Locality
AU - Zhao, Ziyi
AU - Pugdeethosapol, Krittaphat
AU - Lin, Sheng
AU - Li, Zhe
AU - Ding, Caiwen
AU - Wang, Yanzhi
AU - Qiu, Qinru
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - The topic modeling discovers the latent topic probability of the given text documents. To generate the more meaningful topic that better represents the given document, we proposed a new feature extraction technique which can be used in the data preprocessing stage. The method consists of three steps. First, it generates the word/word-pair from every single document. Second, it applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering. Third, it uses the K-means algorithm to merge the word pairs that have the similar semantic meaning. Experiments are carried out on the Open Movie Database (OMDb), Reuters Dataset and 20NewsGroup Dataset. The mean Average Precision score is used as the evaluation metric. Comparing our results with other state-of-the-art topic models, such as Latent Dirichlet allocation and traditional Restricted Boltzmann Machines. Our proposed data preprocessing can improve the generated topic accuracy by up to 12.99 %.
AB - The topic modeling discovers the latent topic probability of the given text documents. To generate the more meaningful topic that better represents the given document, we proposed a new feature extraction technique which can be used in the data preprocessing stage. The method consists of three steps. First, it generates the word/word-pair from every single document. Second, it applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering. Third, it uses the K-means algorithm to merge the word pairs that have the similar semantic meaning. Experiments are carried out on the Open Movie Database (OMDb), Reuters Dataset and 20NewsGroup Dataset. The mean Average Precision score is used as the evaluation metric. Comparing our results with other state-of-the-art topic models, such as Latent Dirichlet allocation and traditional Restricted Boltzmann Machines. Our proposed data preprocessing can improve the generated topic accuracy by up to 12.99 %.
UR - http://www.scopus.com/inward/record.url?scp=85059739805&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059739805&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2018.8546223
DO - 10.1109/ICPR.2018.8546223
M3 - Conference contribution
AN - SCOPUS:85059739805
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3710
EP - 3715
BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th International Conference on Pattern Recognition, ICPR 2018
Y2 - 20 August 2018 through 24 August 2018
ER -