运筹与管理 ›› 2022, Vol. 31 ›› Issue (11): 167-173.DOI: 10.12005/orms.2022.0369

• 应用研究 • 上一篇    下一篇


张文1, 王强1, 唐子旭1, 秦广杰2, 李健1   

  1. 1.北京工业大学 经济与管理学院,中国 北京 100124;
    2.山东浪潮新基建科技有限公司,山东 济南 250011
  • 收稿日期:2021-04-22 出版日期:2022-11-25 发布日期:2022-12-14
  • 通讯作者: 李健(1976-),男,博士,教授、博士生导师,研究方向供应链金融。
  • 作者简介:张文(1981-),男,湖北洪湖人,博士,教授、博士生导师,研究方向为大数据分析。
  • 基金资助:

Research on Data Poorness in Online Deceptive Review Identification

ZHANG Wen, WANG Qiang, TANG Zi-xu, QIN Guang-jie, LI Jian   

  1. 1. School of Economics and Management, Beijing University of Technology, Beijing 100124, China;
    2. Inspur Neus Infrastructure Technalogy Co., Ltd, Junan 250011, China
  • Received:2021-04-22 Online:2022-11-25 Published:2022-12-14

摘要: 机器学习相关技术的发展提升了在线虚假评论识别的准确率,然而现阶段机器学习模型缺少足够量的已标注数据来进行模型训练。本文基于生成式对抗网络(GAN)提出了评论数据集扩充方法GAN-RDE(GAN-Review Dataset Expansion)以解决虚假评论识别中模型训练数据贫乏问题。具体而言,首先将初始评论数据划分为真实评论数据集和虚假评论数据集,使用真实评论数据集和虚假评论数据集分别训练GAN,生成符合真实评论与虚假评论特征分布的向量。然后将GAN训练得到的符合评论特征分布的向量与初始评论数据集的特征词词向量矩阵进行合并,扩充模型训练数据。最后,利用朴素贝叶斯、多层感知机和支持向量机作为基础分类器,对比数据扩充前后虚假评论识别的效果。实验结果表明,使用GAN-RDE方法扩充评论数据集后,机器学习模型对虚假评论识别准确率得到显著提升。

关键词: 虚假评论, 生成式对抗网络, 多层感知机, 支持向量机, 机器学习

Abstract: The development of machine learning related technology has improved the accuracy of onlinedeceptiveyeview identification. However, the current machine learning model lacks enough labeled data to carry out model training. This paper proposes a review dataset expansion method called GAN-RDE based on Generative Adversarial Networks (GAN), which aims to solve the problem of insufficient model training data in deceptive review identification. Specifically, we divide the initial review data into a real review dataset and a deceptive review dataset, and the GAN is trained through the truthful review dataset and the deceptive review dataset respectively, to generate a vector that conforms to the feature distribution of the truthful review and the deceptive review. Secondly, we combine the vector of the review feature distribution with the feature word vector matrix of the initial review dataset to expand the model training data. Finally, the Nai ve Bayes, the multi-layer perceptron, and support vector machine are used as basic classifiers to compare the effects of deceptive review recognition before and after data expansion. The experimental results show that the classifier with the GAN-RDE method can produce better performances than the classifier with the unexpanded dataset in deceptive review identification.

Key words: deceptive review, generative adversarial networks, multi-layer perceptron, support vector machine, machine learning
