1.0 Introduction of
Machine Learning-Based Prediction for Chronic Kidney Disease
For
individuals aged 60 or more, kidney disease is recognized as a significant
issue. Kidney degeneration is the major cause which decreases the glomerular
filtration rate. When it lasts over three months, this problem becomes CKD or
chronic kidney disease [1]. In the world, it is ranked as the tenth major cause
or reason for death. Aging, diabetes, and hypertension are recognized as
leading causes of this disease in addition to some other factors like anemia,
disease of coronary artery, and high blood pressure. With the detection of
disease in its primitive stages, it is recognized as feasible for saving the
function of kidney for the patient's longer survival. The CKD treatment can be
facilitated by its diagnosis and it can aid in avoiding costly procedures of
treatment like transplants and dialysis.
Lab
records and other types of information associated with patients can be analyzed
using techniques of machine learning for early CKD detection [23]. With KDD or
knowledge discovery in the databases low-level data can be converted into effective
and high-level knowledge [2]. Practitioners can be helped in understanding the
patterns of CKD for its diagnosis by this transformation.
CKD
is analyzed by this study with the use of different techniques of machine
learning through a CKD dataset from the data warehouse of machine learning. With
the use of Apriori association method, CKD is identified for four-hundred
instances of patients with CKD across various classification algorithms such as
IBk, J48, naïve Bayes, OneR, and ZeroR. With the normalization and completion
of missing data, the pre-processing of dataset is performed. From the dataset,
relevant features are chosen for improving accuracy and reducing the time of
training for machine learning methods. With the use of different machine
learning methods associated with WEKA, experiments are performed for detecting
CMD on the basis of dataset of CKD from the UCI instrument [21]. For detection
accuracy, results are compared across different techniques of machine learning.
The
remaining part of this paper includes: machine learning methods are described
by the second section. For CKD detection, previous concepts are presented in
the third section while the present work for studying machine learning methods
is presented in the fourth section. Results are reported by the fifth section
and the paper is concluded in the sixth section with opportunities for future
studies.
3.0 Literature
Survey of
Machine Learning-Based Prediction for Chronic Kidney Disease
Chronic
diseases are analyzed and explored by various studies with the use of different
methods for diagnosing diseases early. Various techniques of data mining are
surveyed by Patil [13] for their accuracy of detection including sequential
minimal optimization, k-nearest neighbor, naïve Bayes, radical fundamental
function, decision table, ANN, multilayer perception, and logistic regression. In
accordance with the dataset type, differences in accuracy levels are indicated
by such techniques and for the best result, there is no individual outcome.
The
classifier of nave Bayes is utilized by Dulhare and Ayesha [1] with OneR as the
selector of attribute for predicting CKD with the use of UCI digital
repository’s dataset with twenty-five attributes where thirteen are nominal,
one is a class attribute, and eleven are numeric. The attribute number was
reduced by them by eighty percent through OneR for an increase of 12.5 percent
in detection accuracy.
A
clustering technique is employed by Gopika and Vanitha [2] for accurate
detection of CKD and decreased time of diagnosis. Techniques of fuzzy
k-medoids, k-means, and c-means are utilized by them. In addition to it,
accuracy of eighty-seven percent is indicated with the use of fuzzy technique
of c-means clustering for a dataset obtained from the machine repository of
UCI.
As
classifiers for the detection of CKD using a reliable dataset with twenty-four
attributes, four-hundred instances, and two classes, k-nearest neighbors,
support vector machine, logistic regression, and decision tree are employed by
Charleonnan et al. [5]. A CKD dataset is utilized by them from the machine
learning repository from UCI. The SVM technique is indicated as a better
technique of detection for sensitivity and accuracy of detection by the
results.
The
failure of kidney function is examined by Ramya and Radha [7] with the use of
classification algorithms. In accordance with case severity, kidney disease’s
different stages are classified by them with the use of BPNN or
back-propagation neural network, radial basic function, and random forest. Different
techniques are evaluated by them for different types of performance metrics
such as sensitivity, kappa, and specificity while using a dataset obtained from
the Coimbatore state for approximately a thousand patients with fifteen
attributes. They concluded the radical fundamental function to be most
promising classifier with the detection accuracy of 85.3 percent.
The
techniques of texture analysis are utilized by Iqbal et al. [9] for analysing
the kidney disease’s ultrasound images for distinguishing between kidney
disease and normal patients. Mathematical operations are utilized by them like
Fourier analysis for calculating the RMS or root mean square, gray-level
correlation matrix, average values, and homogeneity. Ultrasound images of
approximately thirty-two patients are considered by them and they distinguished
between both kidney disease and normal patients with the use of cortex region
and RMS values as 0.0049 and 0.3.
Techniques
of hybrid classification were employed by Kayaalp et al. [10] for analyzing
kidney disease with the use of a dataset from the machine learning repository
of UCI with information on approximately four hundred patients. KNN classifier
and support vector machine are employed by them. Feature selection is conducted
by them with the use of gain and relief ratio algorithm suitable for the
dataset feature which is most relevant. It is concluded by them that the
algorithm of KNN offers better performance for the specific chosen features compared
to other algorithms with respect to contrast matrix, precision, and f-measure.
Boosted
classifier and feature selection are utilized by Wibawa et al. [11] for
diagnosing CKD and employing AdaBoost for CFS or correlation-based feature
selection and ensemble learning. Support vector machine, KNN, and naïve Bayes
are utilized by them for detecting and concluding CFS and AdaBoost as reliable
and promising classifiers other than naïve Bayes and KNN classifiers in the
detection of CKD. 0.98 rate of f-measure, 0.981 rate of accuracy, and 0.98 rate
of recall are achieved by them.
CKD
is controlled by Wickramasinghe et al. [12] with the use of an eligible diet
plan and the recommended plans to a number of patients using their method of
classification. A diet plan was recommended by them on the basis of level of
blood potassium. Multiclass decision forest, logistic regression, decision
jungle, and neural network are utilized by them for achieving the accuracy of
99.17 percent through the use of decision forest algorithm.
It
is suggested by previous research that significant insights are provided by
machine learning into data and it can aid in classifying data into a number of
classes. It is indicated by findings that precise classification results can be
produced by machine learning methods if they are utilized together with feature
selection methods. Hence, retaining the advantages of classification outcomes
for techniques of machine learning, a set of renowned techniques is employed by
this study in combination with the techniques of feature selection for
classifying normal patients and the ones with kidney disease.