Evaluating the Impact of Synthetic Data on Financial Machine Learning Models: A Comprehensive Study of AI Techniques for Data Augmentation and Model Training

Authors

  • Debasish Paul JPMorgan Chase & Co, USA Author
  • Praveen Sivathapandi Citi, USA Author
  • Rajalakshmi Soundarapandiyan Elementalent Technologies, USA Author

Keywords:

synthetic data, financial machine learning

Abstract

Financial machine learning asks for synthetic data for availability, model robustness, and privacy. We examine how synthetic data changes financial machine learning systems by means of artificial intelligence-driven data augmentation and model training. Driven by access, privacy, and rules, banks and researchers are replacing real-world data with synthetic data. Advanced statistical techniques include GANs, VAEs provide high-quality financial datasets. These methods might enhance sensitive, incomplete, or biassed real-world data for credit risk assessment, fraud detection, and investment plan optimization.
Generalization, interpretability, and machine learning model performance help one to assess financial synthetic data producing advantages and disadvantages. The work suggests to balance synthetic and real data to reduce model dependability, data leakage, and errors. If synthetic data-based models are to be applied to financial data, then transfer learning, model drift, and domain adaptation have to be addressed.

References

X. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, Dec. 2014, pp. 2672-2680.

D. Kingma and M. Welling, "Auto-Encoding Variational Bayes," Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, Apr. 2014.

J. Y. Zou, X. Zeng, and Y. Zhang, "A Review on Synthetic Data for Financial Machine Learning: Theoretical Perspectives and Practical Implementations," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 6, pp. 2350-2365, Jun. 2022.

R. M. D. Scott, J. J. H. Lee, and A. W. Smith, "Synthetic Data in Finance: Methods, Applications, and Challenges," Journal of Financial Data Science, vol. 4, no. 2, pp. 45-59, Spring 2022.

A. M. Turing, "Computing Machinery and Intelligence," Mind, vol. 59, no. 236, pp. 433-460, Oct. 1950.

H. Chen, T. Xie, and X. Zhang, "Hybrid Models for Financial Time Series Forecasting Using Synthetic Data," Proceedings of the 2021 International Conference on Artificial Intelligence and Statistics, Virtual Event, Apr. 2021, pp. 3156-3164.

S. J. Lee and C. L. Smith, "Leveraging Synthetic Data for Enhanced Credit Risk Assessment Models," Financial Engineering Review, vol. 21, no. 1, pp. 67-83, Mar. 2022.

M. P. Wainwright and M. I. Jordan, "Graphical Models, Exponential Families, and Variational Inference," Foundations and Trends in Machine Learning, vol. 1, no. 1, pp. 1-305, 2008.

Y. Liu, Q. Liu, and R. Zhang, "Evaluation of Synthetic Fraud Scenarios in Financial Fraud Detection Models," Proceedings of the 2021 IEEE Conference on Artificial Intelligence and Security, New York, NY, Dec. 2021, pp. 57-65.

J. C. B. Yao and W. M. Zhan, "Understanding and Addressing Overfitting in Models Trained on Synthetic Data," Journal of Computational Finance, vol. 26, no. 4, pp. 91-110, Jul. 2022.

A. K. Jain and K. S. Rajan, "Synthetic Data Generation for Investment Strategies: A Comprehensive Review," IEEE Access, vol. 10, pp. 14876-14892, Jan. 2022.

E. K. Miller and R. T. Black, "Ethical and Regulatory Considerations in the Use of Synthetic Data," IEEE Transactions on Big Data, vol. 8, no. 3, pp. 523-536, Sep. 2022.

C. C. Ho and S. Y. Chang, "Reinforcement Learning Techniques for Adaptive Synthetic Data Generation," Proceedings of the 2022 Conference on Machine Learning and Data Mining, Berlin, Germany, Jun. 2022, pp. 189-200.

K. M. Tuan and F. M. Lang, "Balancing Synthetic and Real Data: Strategies for Financial Applications," Journal of Financial Risk Management, vol. 19, no. 2, pp. 103-119, May 2022.

A. J. Mitra, "Model Drift in Machine Learning: Challenges and Solutions," Machine Learning Review, vol. 34, no. 1, pp. 7-22, Feb. 2022.

L. M. O’Connor and P. V. Singh, "Privacy-Preserving Synthetic Data: Techniques and Applications," Proceedings of the 2022 International Workshop on Privacy and Security, San Francisco, CA, Aug. 2022, pp. 1-10.

B. S. Xu, H. C. Wu, and G. M. Smith, "Explainable AI for Financial Models Using Synthetic Data," Journal of Artificial Intelligence Research, vol. 73, pp. 55-70, Oct. 2022.

T. M. Iorio and E. H. Garcia, "Domain Adaptation Techniques for Synthetic Data in Financial Machine Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2103-2116, May 2022.

S. B. Ahmad and L. Y. Choi, "Synthetic Data for Privacy-Preserving Finance Applications," Proceedings of the 2021 IEEE International Conference on Privacy, Security and Trust, Chicago, IL, Nov. 2021, pp. 145-155.

J. L. Carter and R. A. Morris, "Future Directions in Synthetic Data Research for Financial Machine Learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 8, pp. 2746-2759, Aug. 2022.

Published

27-08-2022

How to Cite

Evaluating the Impact of Synthetic Data on Financial Machine Learning Models: A Comprehensive Study of AI Techniques for Data Augmentation and Model Training. (2022). Journal of Artificial Intelligence Research and Applications, 2(2), 303-340. https://jairajournal.org/index.php/publication/article/view/21