Multi-classifier for Car Review Sentiment Classification Based on ER Rule

doi:10.12005/orms.2024.0162

Abstract

Abstract: With the rapid development of the next-generation information technology, more and more users are accustomed to sharing personal experience and opinions through the Internet, such as online reviews of book, movie, product usage experience and so on, which always contain positive and negative sentiment of users. Text sentiment analysis aims to use computer technology to detect and extract diverse sentiments, attitudes, opinions and other perceptual information in text documents, thereby converting qualitative user expressions into quantifiable data to serve decision-making and strategic planning. For users, these product reviews can provide them with sufficient information that will help them make informed purchasing decisions to the greatest extent and minimize the degree of regret after consumption. For manufacturers, consumers’ needs can be acquired timely through the reviews, thus adjusting their marketing strategies in a targeted manner and improving the design and quality of products. Currently, due to the exponential growth in the number of these review texts on the Internet, traditional manual analysis methods can hardly satisfy the rapidly changing market demand. Deep learning-based methods may fall into the dilemma of weak interpretability. Therefore, how to automatically obtain users’ sentiment information from numerous comments via a rational and intelligent way is a challenging issue.
　　For the problem of sentimental dichotomy on car commentary corpus, a text sentiment classification method based on ER rule multi-classifier fusion is proposed in this paper. Firstly, the research explores sentiment feature construction by examining the classification effects of various feature models, including unigram, bigram and unigram+bigram. The CHI Square test is adopted for text feature extraction. This method is particularly effective in managing high-dimensional feature spaces, facilitating more accurate sentiment classification by highlighting the most relevant features for analysis. Secondly, the improved TF-IDF method is proposed to enhance the discrimination of terms relevant to sentiment analysis. It incorporates the CHI Square values to assess the distinctiveness of terms across different document classes, and refines the traditional TF-IDF calculation. This adjustment accounts for the distribution of terms within categories, making the sentiment-related terms more impactful for classification tasks. Thirdly, on the basis of fully considering the weights and reliabilities of different classifiers, the ER rule is introduced to fuse multiple classifiers for text sentiment polarity analysis in order to integrate the advantages of different classifiers. Specifically, the classifier is regarded as evidence, and the weight of classifier is dynamically formed by the Euclidean distance between evidence and the difference in judgments of different categories within the evidence. The weight of a classifier is negative with the difference between the results of that classifier and those of all other classifiers, while it is positive with the discrepancy among the judgments of different categories within the classifier. Meanwhile, the accuracy of classifier is assumed to be reliability of the classifier, in order to produce better classification results.
　　In order to verify the effectiveness and rationality of the proposed method, the automobile review data set crawled from the network is used for verification. The result shows that the multi-classifier fusion method based on ER rule can achieve better results in text sentiment classification than single classification algorithm, ensemble algorithm and deep learning algorithm. In addition, to reduce the influence of contingency and single data set, the results are verified using original data sets of hotel comments published in other fields under the same experimental conditions. The experimental comparison results show that the fusion method based on ER rules achieves the best results in F1 value and Accuracy index, and also performs well in Precision and Recall indexes. So this method can be well generalized and applied to text sentiment classification tasks in different fields. At the same time, ablation experiments are conducted on the proposed improved method in terms of feature models selection and feature weights calculation. The experimental results show the effectiveness of the improved method in text sentiment classification performance. In summary, the ER rule considers both the weight and reliability of each classifier to fuse multiple classifiers, and integrates the advantages of different classifiers. The method can effectively reduce the classification limitations caused by different types and topics of text. The final sentiment classification results are stable and balanced, which has a wider applicability in the practice of sentiment classification.

Key words: ER rule, multi-classifier fusion, TFIDF weight, deep learning algorithm, ensemble learning algorithm

摘要： 该文针对汽车评论语料的情感二分类问题,提出一种基于证据推理规则的多分类器融合的情感分类方法。在情感特征构建方面,通过实验对比不同特征模型对分类结果的影响,并改进传统的TFIDF权重计算方法。同时,在此基础上使用ER Rule融合不同分类器进行文本情感极性分析,并考虑各分类器的权重和可靠度。最后,爬取汽车网站上的评论数据对上述方法进行测试,并用公开的中文酒店评论语料数据进行了验证,结果表明该方法能够有效集成不同分类器的优点,与传统机器学习分类算法相比,其结果在Recall,F1 值和Accuracy三个指标上得到了提高,与目前流行的深度学习算法和集成学习算法相比,其结果总体占优。

关键词: 证据推理规则, 多分类器融合, TFIDF权重, 深度学习算法, 集成学习算法

CLC Number:

TP391.43

ZHOU Mi, ZHOU Yajing, HE Yang, FANG Bihe. Multi-classifier for Car Review Sentiment Classification Based on ER Rule[J]. Operations Research and Management Science, 2024, 33(5): 161-168.

周谧, 周雅婧, 贺洋, 方必和. 基于ER Rule的多分类器汽车评论情感分类研究[J]. 运筹与管理, 2024, 33(5): 161-168.

References

[1] BIRJALI M, KASRI M, BENI-HSSANE A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends[J]. Knowledge-Based Systems, 2021, 226: 107134.
[2] PANDEY A C, KULHARI A, SHUKLA D S. Enhancing sentiment analysis using Roulette wheel selection based cuckoo search clustering method[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(1): 1-29.
[3] ZHENG L J, WANG H W, SONG G. Sentimental feature selection for sentiment analysis of Chinese online reviews[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(1): 75-84.
[4] KOU G, YANG P, PENG Y, et al. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods[J]. Applied Soft Computing, 2020, 86: 105836.
[5] GARG M. UBIS: Unigram Bigram importance score for feature selection from short text[J]. Expert Systems with Applications, 2022, 195: 116563.
[6] 李平,戴月明,王艳.基于混合卡方统计量与逻辑回归的文本情感分析[J].计算机工程,2017,43(12):192-196.
[7] 唐加山,段丹丹.文本分类中基于CHI和PCA混合特征的降维方法[J].重庆邮电大学学报(自然科学版),2022,34(1):164-171.
[8] HAN Y, LIU Y H, JIN Z G. Sentiment analysis via semi-supervised learning: A model based on dynamic threshold and multi-classifiers[J]. Neural Computing and Applications, 2020, 32(9): 5117-5129.
[9] YE X, DAI H X, DONG L A, et al. Multi-view ensemble learning method for microblog sentiment classification[J]. Expert Systems with Applications, 2021, 166: 113987.
[10] 周锦峰,叶施仁,王晖.基于深度卷积神经网络模型的文本情感分类[J].计算机工程,2019,45(3):300-308.
[11] SHARFUDDIN A A, TIHAMI M N, ISLAM M S. A deep recurrent neural network with bilstm model for sentiment classification[C]//2018 International Conference on Bangla Speech and Language Processing (ICBSLP),September 21-22,2018, Sylhet,Bangladesh. IEEE, 2018: 1-4.
[12] 胡荣磊,芮璐,齐筱,等.基于循环神经网络和注意力模型的文本情感分析[J].计算机应用研究,2019,36(11):3282-3285.
[13] TRIPATHY A, AGRAWAL A, RATH S K. Classification of sentiment reviews using n-gram machine learning approach[J]. Expert Systems with Applications, 2016, 57: 117-126.
[14] 游凤芹,钟芳,周展.中文多类别情感分类模型中特征选择方法[J].计算机应用,2016,36(A02):242-246.
[15] GUO A Z, YANG T. Research and improvement of feature words weight based on TFIDF algorithm[C]//2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, May 20-22, 2016, Chongqing, China. IEEE, 2016: 415-419.
[16] 邵晓根,鞠训光,胡局新,等.基于改进权重的贝叶斯推理和TFIDF算法文本主题词提取研究[J].南京师大学报:自然科学版,2014,37(1):57-60.
[17] KUNCHEVA L I. Combining Pattern Classifiers: Methods and Algorithms[M]. John Wiley & Sons, Inc., 2004.
[18] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[19] MOHANDES M, DERICHE M, ALIYU S O. Classifiers combination techniques: A comprehensive review[J]. IEEE Access, 2018, 6: 19626-19639.
[20] YANG J B, XU D L. Evidential reasoning rule for evidence combination[J]. Artificial Intelligence, 2013, 205: 1-29.
[21] LIU Z G, PAN Q, DEZERT J, et al. Combination of classifiers with optimal weight based on evidential reasoning[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(3): 1217-1230.
[22] 姜杰,夏睿.机器学习与语义规则融合的微博情感分类方法[J].北京大学学报 (自然科学版),2017,53(2):247-254.