Features Engineering: AI-based extraction and selection of features in big datasets

Authors

  • Muneer Ahmed Salamkar Senior Associate at JP Morgan Chase, USA Author

Keywords:

Automated Feature Extraction, Big Data Analytics, AI

Abstract

Feature engineering is more crucial to data analysis & ML, effecting predictive models’ performance. For large datasets, automated feature extractions & the selection have been made the procedure quicker & more efficient thanks to AI. Deep learning for features extraction, genetic algorithms for selections & the clustering for unsupervised approaches to find patterns & correlations that human methods overlook. By concentrating on the most important characteristics and reducing duplication and noise, these automated methods save time, minimize expertise, and increase model accuracy. Optimization methods simplify essential feature selection, while neural networks automatically produce abstract features from raw data. This helps with huge datasets when manual feature engineering isn't possible. These methods help banking, healthcare, and e-commerce get insights and make better judgments. Despite benefits, interpretability, overfitting, and computing costs remain issues. AI-enabled feature engineering helps firms design more accurate, scalable, and efficient models, unlocking data value.

References

1. Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E. B., & Turaga, D. S. (2017, August). Learning Feature Engineering for Classification. In Ijcai (Vol. 17, pp. 2529-2535).

2. Dong, G., & Liu, H. (Eds.). (2018). Feature engineering for machine learning and data analytics. CRC press.

3. Horn, F., Pack, R., & Rieger, M. (2020). The autofeat python library for automated feature engineering and selection. In Machine Learning and Knowledge Discovery in Databases: International Workshops of ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I (pp. 111-120). Springer International Publishing.

4. Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. journal of applied science and technology trends, 1(1), 56-70.

5. Liu, H., & Motoda, H. (Eds.). (1998). Feature extraction, construction and selection: A data mining perspective (Vol. 453). Springer Science & Business Media.

6. Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: principles and techniques for data scientists. " O'Reilly Media, Inc.".

7. Mierswa, I., & Morik, K. (2005). Automatic feature extraction for classifying audio data. Machine learning, 58, 127-149.

8. Khalid, S., Khalil, T., & Nasreen, S. (2014, August). A survey of feature selection and feature extraction techniques in machine learning. In 2014 science and information conference (pp. 372-378). IEEE.

9. Rostami, M., Berahmand, K., Nasiri, E., & Forouzandeh, S. (2021). Review of swarm intelligence-based feature selection methods. Engineering Applications of Artificial Intelligence, 100, 104210.

10. Chen, Z., Zhao, P., Li, F., Marquez-Lago, T. T., Leier, A., Revote, J., ... & Song, J. (2020). iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in bioinformatics, 21(3), 1047-1057.

11. Kasongo, S. M., & Sun, Y. (2019). A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE access, 7, 38597-38607.

12. Christ, M., Braun, N., Neuffer, J., & Kempa-Liehr, A. W. (2018). Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing, 307, 72-77.

13. Bahnsen, A. C., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134-142.

14. Jenke, R., Peer, A., & Buss, M. (2014). Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective computing, 5(3), 327-339.

15. Garla, V. N., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of biomedical informatics, 45(5), 992-998.

16. Thumburu, S. K. R. (2022). AI-Powered EDI Migration Tools: A Review. Innovative Computer Sciences Journal, 8(1).

17. Thumburu, S. K. R. (2022). Post-Migration Analysis: Ensuring EDI System Performance. Journal of Innovative Technologies, 5(1).

18. Gade, K. R. (2022). Cloud-Native Architecture: Security Challenges and Best Practices in Cloud-Native Environments. Journal of Computing and Information Technology, 2(1).

19. Gade, K. R. (2021). Data-Driven Decision Making in a Complex World. Journal of Computational Innovation, 1(1).

20. Katari, A., Ankam, M., & Shankar, R. Data Versioning and Time Travel In Delta Lake for Financial Services: Use Cases and Implementation.

21. Katari, A., & Rallabhandi, R. S. DELTA LAKE IN FINTECH: ENHANCING DATA LAKE RELIABILITY WITH ACID TRANSACTIONS.

22. Thumburu, S. K. R. (2021). Optimizing Data Transformation in EDI Workflows. Innovative Computer Sciences Journal, 7(1).

23. Thumburu, S. K. R. (2020). Interfacing Legacy Systems with Modern EDI Solutions: Strategies and Techniques. MZ Computing Journal, 1(1).

24. Gade, K. R. (2019). Data Migration Strategies for Large-Scale Projects in the Cloud for Fintech. Innovative Computer Sciences Journal, 5(1).

25. Gade, K. R. (2020). Data Mesh Architecture: A Scalable and Resilient Approach to Data Management. Innovative Computer Sciences Journal, 6(1).

Published

15-12-2023

How to Cite

Features Engineering: AI-based extraction and selection of features in big datasets. (2023). Journal of Artificial Intelligence Research and Applications, 3(2), 1130-1148. https://jairajournal.org/index.php/publication/article/view/51