Operations Research and Management Science ›› 2024, Vol. 33 ›› Issue (5): 161-168.DOI: 10.12005/orms.2024.0162

• Application Research • Previous Articles     Next Articles

Multi-classifier for Car Review Sentiment Classification Based on ER Rule

ZHOU Mi, ZHOU Yajing, HE Yang, FANG Bihe   

  1. 1. School of Management, Hefei University of Technology, Hefei 230009, China;
    2. Engineering Research Centerfor Intelligent Decision-making and Information System Technologies, Ministry of Education, Hefei 230009, China
  • Received:2021-12-18 Online:2024-05-25 Published:2024-07-19

基于ER Rule的多分类器汽车评论情感分类研究

周谧, 周雅婧, 贺洋, 方必和   

  1. 1.合肥工业大学 管理学院,安徽 合肥 230009;
    2.智能决策与信息系统技术教育部工程研究中心,安徽 合肥 230009
  • 通讯作者: 贺洋(1994-),男,安徽安庆人,硕士,研究方向:决策理论与方法。
  • 作者简介:周谧(1983-),男,安徽郎溪人,教授,博士,研究方向:决策理论与方法;周雅婧(1999-),女,山西长治人,博士研究生,研究方向:决策理论与方法;方必和(1964-),男,安徽巢湖人,副教授,硕士,研究方向:决策理论与方法。
  • 基金资助:
    国家自然科学基金资助项目(71521001);NSFC-浙江两化融合项目(U1709215)

Abstract: With the rapid development of the next-generation information technology, more and more users are accustomed to sharing personal experience and opinions through the Internet, such as online reviews of book, movie, product usage experience and so on, which always contain positive and negative sentiment of users. Text sentiment analysis aims to use computer technology to detect and extract diverse sentiments, attitudes, opinions and other perceptual information in text documents, thereby converting qualitative user expressions into quantifiable data to serve decision-making and strategic planning. For users, these product reviews can provide them with sufficient information that will help them make informed purchasing decisions to the greatest extent and minimize the degree of regret after consumption. For manufacturers, consumers’ needs can be acquired timely through the reviews, thus adjusting their marketing strategies in a targeted manner and improving the design and quality of products. Currently, due to the exponential growth in the number of these review texts on the Internet, traditional manual analysis methods can hardly satisfy the rapidly changing market demand. Deep learning-based methods may fall into the dilemma of weak interpretability. Therefore, how to automatically obtain users’ sentiment information from numerous comments via a rational and intelligent way is a challenging issue.
  For the problem of sentimental dichotomy on car commentary corpus, a text sentiment classification method based on ER rule multi-classifier fusion is proposed in this paper. Firstly, the research explores sentiment feature construction by examining the classification effects of various feature models, including unigram, bigram and unigram+bigram. The CHI Square test is adopted for text feature extraction. This method is particularly effective in managing high-dimensional feature spaces, facilitating more accurate sentiment classification by highlighting the most relevant features for analysis. Secondly, the improved TF-IDF method is proposed to enhance the discrimination of terms relevant to sentiment analysis. It incorporates the CHI Square values to assess the distinctiveness of terms across different document classes, and refines the traditional TF-IDF calculation. This adjustment accounts for the distribution of terms within categories, making the sentiment-related terms more impactful for classification tasks. Thirdly, on the basis of fully considering the weights and reliabilities of different classifiers, the ER rule is introduced to fuse multiple classifiers for text sentiment polarity analysis in order to integrate the advantages of different classifiers. Specifically, the classifier is regarded as evidence, and the weight of classifier is dynamically formed by the Euclidean distance between evidence and the difference in judgments of different categories within the evidence. The weight of a classifier is negative with the difference between the results of that classifier and those of all other classifiers, while it is positive with the discrepancy among the judgments of different categories within the classifier. Meanwhile, the accuracy of classifier is assumed to be reliability of the classifier, in order to produce better classification results.
  In order to verify the effectiveness and rationality of the proposed method, the automobile review data set crawled from the network is used for verification. The result shows that the multi-classifier fusion method based on ER rule can achieve better results in text sentiment classification than single classification algorithm, ensemble algorithm and deep learning algorithm. In addition, to reduce the influence of contingency and single data set, the results are verified using original data sets of hotel comments published in other fields under the same experimental conditions. The experimental comparison results show that the fusion method based on ER rules achieves the best results in F1 value and Accuracy index, and also performs well in Precision and Recall indexes. So this method can be well generalized and applied to text sentiment classification tasks in different fields. At the same time, ablation experiments are conducted on the proposed improved method in terms of feature models selection and feature weights calculation. The experimental results show the effectiveness of the improved method in text sentiment classification performance. In summary, the ER rule considers both the weight and reliability of each classifier to fuse multiple classifiers, and integrates the advantages of different classifiers. The method can effectively reduce the classification limitations caused by different types and topics of text. The final sentiment classification results are stable and balanced, which has a wider applicability in the practice of sentiment classification.

Key words: ER rule, multi-classifier fusion, TFIDF weight, deep learning algorithm, ensemble learning algorithm

摘要: 该文针对汽车评论语料的情感二分类问题,提出一种基于证据推理规则的多分类器融合的情感分类方法。在情感特征构建方面,通过实验对比不同特征模型对分类结果的影响,并改进传统的TFIDF权重计算方法。同时,在此基础上使用ER Rule融合不同分类器进行文本情感极性分析,并考虑各分类器的权重和可靠度。最后,爬取汽车网站上的评论数据对上述方法进行测试,并用公开的中文酒店评论语料数据进行了验证,结果表明该方法能够有效集成不同分类器的优点,与传统机器学习分类算法相比,其结果在Recall,F1 值和Accuracy三个指标上得到了提高,与目前流行的深度学习算法和集成学习算法相比,其结果总体占优。

关键词: 证据推理规则, 多分类器融合, TFIDF权重, 深度学习算法, 集成学习算法

CLC Number: