Operations Research and Management Science ›› 2023, Vol. 32 ›› Issue (12): 118-123.DOI: 10.12005/orms.2023.0395

• Application Research • Previous Articles     Next Articles

Research on Credit Card Fraud Detection Based on Three-stage Ensemble Learning

RUAN Sumei1, SUN Xusheng2, GAN Zhongxin3   

  1. 1. School of finance, Anhui University of Finance and Economics, Bengbu 233030, China;
    2. School of Management, Hefei University of Technology, Hefei 230009, China;
    3. Solbridge International School of Business, Woosong University, Daejeon 300814, South Korea
  • Received:2022-05-06 Online:2023-12-25 Published:2024-02-06

基于三阶段集成学习的信用卡欺诈检测研究

阮素梅1, 孙旭升2, 甘中新3   

  1. 1.安徽财经大学 金融学院,安徽 蚌埠 233030;
    2.合肥工业大学 管理学院,安徽 合肥 230009;
    3.Solbridge International School of Business,Woosong University, Daejeon 300814, South Korea
  • 通讯作者: 甘中新(1997-),男,安徽安庆人,博士研究生,研究方向:机器学习。
  • 作者简介:阮素梅(1974-),女,安徽太和人,博士,教授,研究方向:金融科技;孙旭升(1997-),男,黑龙江哈尔滨人,博士研究生,研究方向:信用风险管理。
  • 基金资助:
    安徽省自然科学基金面上项目(2008085MG234);2020年高校学科(专业)拔尖人才学术资助项目(gxbjZD2020004)

Abstract: The banking industry is confronted with a grave concern in the form of credit card fraud, which causes global losses amounting to billions of dollars annually. This persistent issue highlights the urgent need for innovative solutions to combat fraud effectively. However, credit card transaction data presents inherent challenges, including feature redundancy and sample imbalance, which severely hinder the accurate detection of fraudulent transactions. These challenges necessitate the development of advanced models capable of addressing these issues and improving fraud detection accuracy. Theoretical and practical significance underpin the importance of this research. On a theoretical level, this study contributes to the field of credit card fraud detection by proposing a novel three-stage ensemble learning model, “FS-IFKK-Stacking”. This model combines feature selection, imbalanced data processing, and heterogeneous model ensemble techniques to overcome the obstacles posed by feature redundancy and sample imbalance. The proposed model not only enhances the accuracy of fraud detection but also confirms the utility of the ensemble learning on credit card transaction fraud by addressing the problem of overfitting, resulting in more precise identification of fraudulent transactions. On a practical level, the significance of this research lies in its potential to minimize the substantial financial losses experienced by financial institutions and customers alike. By accurately identifying fraudulent credit card transactions, banks can protect their customers from financial harm and maintain the integrity of their services. Additionally, the proposed model offers insights into the development of more robust fraud detection systems and contributes to the ongoing efforts to combat credit card fraud on a global scale.
The “FS-IFKK-Stacking” model proposed in this research comprises three stages, each incorporating specific techniques to address the challenges of feature redundancy and sample imbalance. To begin with, the model utilizes a feature selection (FS) method to identify and eliminate redundant features in the credit card transaction data. This process reduces the dimensionality of the data and focuses on the most informative features, thereby improving the model’s ability to distinguish between fraudulent and legitimate transactions. Next, the imbalanced data processing stage employs the IFKK method, specifically designed to handle imbalanced datasets. This method employs a group of undersampling techniques, including methods based on Isolation Forest, K-Means++ clustering and KNN to rebalance the dataset. By ensuring sufficient exposure to fraudulent instances during training, the model is better equipped to learn and detect fraudulent patterns accurately. The final stage of the proposed framework involves a heterogeneous model ensemble based on the Stacking method. This technique combines predictions from multiple models, each trained on different subsets of data or with different algorithms. By leveraging the diverse strengths of these individual models, the ensemble model achieves improved performance and enhanced fraud detection accuracy. The data set used in this article is selected from the European cardholder provided by the European cardholder provided by Kaggle within two days of September 2013, with a total of 28,4807 samples, which have been widely used in the field of fraud detection. The model’s performance is evaluated using various analytical techniques, including statistical measures such as the Area Under the Curve (AUC) and the recall of fraudulent transactions.
The theoretical and empirical results demonstrate the effectiveness of the “FS-IFKK-Stacking” model in detecting credit card fraud. The experimental evaluations conducted on public datasets show significant improvements compared to a single-class model trained on the original sample. The AUC metric, a widely used measure for classification performance, exhibits a notable increase of 0.44%. Moreover, the recall of fraudulent transactions demonstrates a substantial improvement of 3.27%. These results validate the model’s ability to accurately identify fraudulent credit card transactions, mitigating financial losses and enhancing security measures within the banking industry.
The application of the “FS-IFKK-Stacking” model extends beyond the research context. By reducing false negatives and enhancing the overall accuracy of fraud detection, banks can protect their customers, preserve their reputation, and maintain trust within the financial ecosystem. Other financial institutions can leverage this model to enhance their fraud detection systems, enabling them to promptly identify and prevent fraudulent activities. Furthermore, this research contributes to the academic discourse on credit card fraud detection methodologies. The proposed model showcases the efficacy of ensemble learning techniques and advanced data processing methods in addressing feature redundancy and sample imbalance. By expanding the understanding of effective fraud detection strategies, this research paves the way for further advancements in the field and supports ongoing efforts to combat credit card fraud. In conclusion, this research provides a comprehensive approach to credit card fraud detection by introducing the “FS-IFKK-Stacking” model. The methodology addresses the challenges posed by feature redundancy and sample imbalance, resulting in enhanced accuracy and improved detection of fraudulent transactions. The theoretical and empirical results demonstrate the model’s effectiveness, underscoring its potential for practical application within the banking industry. And further work would focus on the more advanced techniques to address the concerns about the three stages in the learning process in detecting credit card fraud.

Key words: credit card fraud detection; ensemble learning; feature engineering

摘要: 信用卡欺诈是银行业面临的严峻问题,全球每年因信用卡欺诈造成的损失高达数十亿美元。然而信用卡交易数据存在特征冗余和样本不平衡的问题,这无疑增加了模型对少数欺诈交易的检测难度。针对以上问题,本文提出了三阶段集成学习模型“FS-IFKK-Stacking”:基于FS方法的特征选择、基于IFKK方法的不平衡数据处理和基于Stacking方法的异构模型集成。该模型同时解决了由于特征冗余和样本不均衡性导致的过拟合问题,能够更加准确地检测信用卡欺诈交易。基于Kaggle欧洲信用卡交易数据的实验表明:本文提出的“FS-IFKK-Stacking”模型对信用卡欺诈的检测效果明显优于基于原始样本训练的单类模型:在AUC提升0.44%的同时,对欺诈交易的召回率提升了3.27%。

关键词: 信用卡欺诈检测, 集成学习, 特征工程

CLC Number: