A Domain Driven Data Architecture to Improvement of Data Quality in Distributed Data Systems
Keywords:
Domain-Driven Design, Data Architecture, Data Quality, Distributed DatasetsAbstract
Sometimes scattered among several systems and departments, organizations struggle to manage vast amounts of data. As data increases in bulk and complexity, especially when datasets are distributed across many platforms with different formats and architectures, ensuring consistent quality is increasingly difficult. of breaking down complex data systems into smaller, controllable components each under control of their respective domain, a domain-driven data architecture offers a solution. Using domain-driven design (DDD) ideas, companies might improve data management by means of clear ownership definition, data validation and transformation techniques, and synchronizing assurance across many systems. This approach helps to manage data quality in distant locations by allowing a more ordered, coherent structure. The execution of domain-specific data validation and transformation is a fundamental element of this architecture as it guarantees that every dataset conforms with quality requirements before processing or distribution throughout systems. By combining this domain-centric approach with present technologies such as data warehouses, lakes, and governance systems, one may improve data quality management at every stage of the data life. This paper shows how domain-driven design improves data quality, hence increasing dependability, accessibility, and consistency across companies, using actual case studies from several fields.
References
1. Karkouch, A., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57-81.
2. Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1-52.
3. Lee, K., Weiskopf, N., & Pathak, J. (2018, April). A framework for data quality assessment in clinical research datasets. In AMIA Annual Symposium Proceedings (Vol. 2017, p. 1080).
4. Gudivada, V., Apon, A., & Ding, J. (2017). Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software, 10(1), 1-20.
5. Zheng, Y. (2015). Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data, 1(1), 16-34.
6. Lemmen, C. (2012). A domain model for land administration.
7. Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4), 623-640.
8. Kahn, M. G., Callahan, T. J., Barnard, J., Bauck, A. E., Brown, J., Davidson, B. N., ... & Schilling, L. (2016). A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. Egems, 4(1).
9. Khatri, V., & Brown, C. V. (2010). Designing data governance. Communications of the ACM, 53(1), 148-152.
10. Kambatla, K., Kollias, G., Kumar, V., & Grama, A. (2014). Trends in big data analytics. Journal of parallel and distributed computing, 74(7), 2561-2573.
11. Mendes, P. N., Mühleisen, H., & Bizer, C. (2012, March). Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 joint EDBT/ICDT workshops (pp. 116-123).
12. Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of “big data” on cloud computing: Review and open research issues. Information systems, 47, 98-115.
13. Loshin, D. (2001). Enterprise knowledge management: The data quality approach. Morgan Kaufmann.
14. Wang, R. Y. (2001). Data quality. Kluwer Academic Pub.
15. Devillers, R., Bédard, Y., & Jeansoulin, R. (2005). Multidimensional management of geospatial data quality information for its dynamic use within GIS. Photogrammetric Engineering & Remote Sensing, 71(2), 205-215.
16. Thumburu, S. K. R. (2020). Enhancing Data Compliance in EDI Transactions. Innovative Computer Sciences Journal, 6(1).
17. Thumburu, S. K. R. (2020). A Comparative Analysis of ETL Tools for Large-Scale EDI Data Integration. Journal of Innovative Technologies, 3(1).
18. Gade, K. R. (2020). Data Mesh Architecture: A Scalable and Resilient Approach to Data Management. Innovative Computer Sciences Journal, 6(1).
19. Gade, K. R. (2020). Data Analytics: Data Privacy, Data Ethics, Data Monetization. MZ Computing Journal, 1(1).
20. Katari, A., & Rallabhandi, R. S. DELTA LAKE IN FINTECH: ENHANCING DATA LAKE RELIABILITY WITH ACID TRANSACTIONS.
21. Katari, A. Conflict Resolution Strategies in Financial Data Replication Systems.
22. Komandla, V. Enhancing Security and Fraud Prevention in Fintech: Comprehensive Strategies for Secure Online Account Opening.
23. Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.
24. Thumburu, S. K. R. (2020). Interfacing Legacy Systems with Modern EDI Solutions: Strategies and Techniques. MZ Computing Journal, 1(1).
25. Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).