To improve the
data quality, there are many cleansing tools might be useful tools for f the
activities’ automation which is involved in the cleansing of adjustment,
transformation, data-parsing and standardizing. Without the validation of the
data models, the continuous progress in the entities cannot be done. Entities
are linked to make a strong relationship between them using the ERD. It is not
meaningfully devised a unique data quality concept. There are many techniques of
improving data quality and to be applied the data in a specific type, and it
should carefully consider the analytics over the data. RDBMS represented the
data in the tabular format, and there is some sort of relation between the
tables. Dimensions are usually defined as a qualitative method, it refers to
the general types of data that is associated with dimensions are separate. When
people asked about the data quality, they often asked about the data accuracy (Chen, Meyer,
Ganapathi, Liu, & Cirella, 2011).
Data quality management has discussed the multiple
techniques for assessing the quality of data along multiple dimensions but
practitioners and researchers have underscored the contextual quality
assessment’s importance in recent years and highlighted its contribution to the
decision-making. In the paper there
would be discussion about persistent attention to contextual aspects, latest
data quality measurement methods need to be revised and alternatives should be
considered that reflect the contextual evaluation in a better way (Even &
Shankaranarayanan, 2005).
Literature review of Data Quality Improvement Strategies
Chen et al. (2011) examined the improving quality of
data in the relational database: overcoming functional entaglements. According
to the authors, common data anomalies have failed to be prevented by the
methods of traditional vertical decomposition in the relational database
normalization. Moreover, authors have stated that data quality may still be
deteriorated even after that database can be normalized highly and the
potential data anomalies are the reason of this deteriorated data quality.
Authors have discussed that database needs to be further improved by
practitioners after applying the methods of traditional normalization, because
of the functional entanglement’s existence; authors have defined this
phenomenon. Two methods have been outlined in the paper for functional
entanglements identification in the normalized database that would be the first
place towards the improvement in data quality.
Furthermore, several practical methods have analyzed by
the researchers for common data anomalies prevention by restricting and
eliminating the functional entanglements’ effects. Authors have revealed the
traditional method’s shortcomings in database normalization with admiration to
the common data anomalies’ prevention and offered the useful techniques to
practitioners for data quality improvement. Horizontal decomposition and
field-level disentanglement are the two methods examined by authors for
functional entanglements elimination at the normalized database’s design level.
Authors have suggested that the requirements should be carefully evaluated by
the practitioners to apply the most suitable method while dealing with
potential data anomalies (Chen et al., 2011).
The have conducted the data quality tools' survey.
According to the authors, data is transformed by data quality tools with
problems into the good quality data for some specific application domain of the
organization. Commercial, as well as research data quality tools’
classification, is presented in the paper according to the three perspectives
i.e. data quality problems’ taxonomy that is not addressed by the current
technology used in the RDBMS, generic functionalities
listing and division of data quality tools into many of the groups. Data
profiling, analysis, transformation, cleaning, duplicate elimination and data
enrichment are the six categories of tools identified.
According to the research conducted by Singh &
Singh, (2010), it is descriptively classified the causes and problems of data
quality in data storage. Authors have addressed this issue in the perspective
of organizations that organizations have become aware of the decision-oriented
benefits and oriented databases of business intelligence, this is why data recording
keeping is gaining eminence. Populating process of data is quite a difficult
task, especially with quality data and authors, have contributed the issues of
data quality over the period of time but this research has gathered the data
quality problem's causes collectively at the data phases such as data sources,
data profiling and data integration, ETL and Data staging, schema design and
database modeling. The authors have identified the reasons for reachability
problems, on-availability and data deficiencies at data storage aforementioned
stages such as insufficient data profiling, content analysis and automating
profiling tools ‘inappropriate selection (Singh & Singh, 2010).
Batini et al., (2007) monitored as well as assess the
methodology and framework for data quality. According to the authors, data
quality is emerging as the new area organizations' improvement of the
effectiveness. Poor quality data are frequently experienced in the routine life
of enterprises despite its bad consequences and there are very few
organizations that use specific methodologies for monitoring and assessing the
quality of organizational data. Authors have presented the Italian project's
first results whose objective is to produce the well-known approaches' enhanced
version for the Basel II operational risk evaluation along with the significant
relevance to the data quality and information and its impact on the operational
risk. Definition of the assessment methodology is the focus of authors in this
paper along with the supporting tool for data quality. Authors have explained
the different phases of steps of methodology such as data quality risk
prioritization, risk identification, risk measurement and risk monitoring. The
methodology developed by the authors is based on the even loss’s notion caused
by the low quality of data (Batini et al., 2007).
Sadiq & Duckham examined the querying and storing
data quality information in the context of spatially varying by using the
integrated spatial RDBMS. According to the authors, the current SDQ (Spatial
Data Quality) representation do not represent the SDQ's spatial variation
adequately. For instance, if some user wishes to get familiar with the
feature's positional accuracy then normally has to rely on the statements of
metadata that normally refers to the entire set of data. Authors have stressed
that SDQ varies spatially in reality; in some location, quality time may be
higher and in other locations, it can be the lower perhaps because of the
different procedures of data collection as well as methods of acquisition. SDQ
need to be stored to represent the data quality that varies spatially
individual features as well as feature's part in the dataset.
The authors of this research have proposed the flexible and
new data model for the spatially varying quality information’s retrieval and
efficient storage in the spatial database. The quality information is stored in
the model proposed by the authors in several ways according to the data set’s
requirements. The authors have reported on the expansion to Oracle spatial
RDBMS that is being used in order to implement the spatially varying SDQ's
model. The authors have conducted the investigations into the several querying
mechanisms that are needed to support the SDQ model, the investigations have
shown that spatially varying quality’s flexible representation is allowed by
the different models including the sub-feature variation in the quality (Sadiq
& Duckham).
Conclusion of Data Quality Improvement Strategies
Summing up the discussion it can be said that there are
many techniques of improving data quality and to be applied the data in a
specific type and it should carefully consider the analytics over the data. The
traditional method’s shortcomings in database normalization with admiration to
the common data anomalies’ prevention and offered the useful techniques to
practitioners for data quality improvement. Commercial, as well as research
data quality tools’ classification, is presented in the paper according to the
three perspectives i.e. Populating process
of data is quite a difficult task, especially with quality data and authors,
have contributed the issues of data quality over the period of time
Poor quality data are frequently experienced in the
routine life of enterprises despite its bad consequences and there are very few
organizations that use specific methodologies for monitoring and assessing the
quality. SDQ need to be stored to represent the data quality that varies
spatially individual features as well as feature's part in the dataset. The
quality information is stored in the model proposed by the authors in several
ways according to the data set’s requirements.
References of Data Quality Improvement Strategies
Barateiro, J., &
Galhardas, H. (2005). A survey of data quality tools. Datenbank-Spektrum,
15-21.
Batini, C., Barone, D.,
Mastrella, M., Maurino, A., & Ruffini, C. (2007). A FRAMEWORK AND A
METHODOLOGY FOR DATA QUALITY ASSESSMENT AND MONITORING. Università di
Milano Bicocca, Italy. Università di Milano Bicocca, Italy.
Chen, T. X., Meyer, M.
D., Ganapathi, N., Liu, S. (., & Cirella, J. M. (2011). Improving Data
Quality in Relational Databases: Overcoming Functional Entanglements. RTI
International. RTI Press publication OP-0004-1105.
Even, A., &
Shankaranarayanan, G. (2005). VALUE-DRIVEN DATA QUALITY ASSESSMENT.
Boston University School of Management, IS Department, Boston. Boston: Boston
University School of Management.
Sadiq, Z., &
Duckham, M. (n.d.). STORING AND QUERYING SPATIALLY VARYING DATA QUALITY
INFORMATION USING AN INTEGRATED SPATIAL RDBMS. The University of Melbourne,
Victoria. Cooperative Research Centre for Spatial Information, Dept. of
Geomatics.
Scannapieco, M. (2016).
On the Meaningfulness of “Big Data Quality” (Invited Paper). Journal of
Science and technology, 20.
Singh, R., & Singh,
K. (2010). A Descriptive Classification of Causes of Data Quality Problems in
Data Warehousing. IJCSI International Journal of Computer Science Issues, 7(3),
41-50.