Abstract of Data Mining and Big Data
The research paper is related about the
exploration of the methods that are used in the hidden and unknown data that
could be used in the other purposes. This research paper is based on the data
mining and big data that is being used in the market according to the situation
of the market. There may be challenges of data mining and big data that are
discussed in detail while implementing modern technology in the progress of
exploration of resources. Data mining and big data analysis are not same in
nature but there are some similarities in data collection mostly adopted in the
business world are implemented big data and data mining. The paper is based on
the complete discussion with literature review with tables to explain the data
mining and using the algorithm and association rules many variables containing
large datasets can further provide strong support as well as to detect important
information by the use of other data mining techniques. In this research work,
researchers primarily focused on empirical research work by using a various
contributing variable with the variable of CRM. The paper is concluded with the
results that are attained with the research on big data and data mining.
I. INTRODUCTION of Data Mining and Big Data
he advanced technologies and data sciences have
changed the information system in the world. The data associated with customer
reviews and products sales are easily available to companies in the form of big
data that enable them to make the right decisions regarding their future
business operations and policies development. More importantly, big data is
facilitating the educational system by providing information about the behavior
of society as well as the consumption of available facilities in the world. A
most relevant example of big data is a database containing mass level
information about the consumption of electricity or other natural resources in
a state. What uncovers association between various items is machine learning
and association analysis. In other words, a useful (but somewhat overlooked)
technique is called association analysis which attempts to find common patterns
of items in large data sets. This kind of specific applications can be
recognized as the market basket analysis. The present work is about this market
basket analysis while paying attention to the big data and data mining systems.
The research work will also elaborate on the association rule mining and
Apriori Algorithm in data mining.
II. LITERATURE REVIEW of Data Mining and Big Data
In accordance with the literature review, in data management
Apripori Algorithm is an algorithm in use of data managers for the association
rule mining. Does the literature review cover information regarding what is
meant by association rule mining and market basket analysis in data mining?
Following findings, a technique that develops an association between different
set of items by the use of frequent patterns is known as association rule
mining. Take the example of two variables that are positively correlated as an
increase of one variable result in the increase of the second associated
variable. In data sciences, the best technique to study and deal with these
association rule mining is apriori algorithm which is meant to be a subset of a
frequent item set. It works in accordance with the key concept of frequency of
occurrence between two or more than two different events.
Following research findings of Reyes and Valenzuela (2019),
the big data is highly supportive for the organization to take the effective
decision and ensure practical fuzzy analytic network process for customer
relationship management. In accordance with this research study, the k-means
clustering algorithm is to partition observation into k-clusters regarding
various observations specifically related to the nearest mean of cluster. In
this kind of algorithm, data can be classified in different groups and clusters
regarding various dimensions and related associated data sets. Using this
algorithm and association rules many variables containing large datasets can
further provide strong support as well as to detect important information by
the use of other data mining techniques. In this research work, researchers
primarily focused on empirical research work by using a various contributing
variable with the variable of CRM. Researchers used k-mean clustering
strategies and other association rules to find out the data techniques which
are most suitable to increase customer relationship management in the
organizations. Conclusively, this research study has the main aim to
investigate the use of these data mining techniques in business practices
(Reyes & Valenzuela, 2019).
In accordance with the research findings of Galiano et al
conducted in 2016, data mining techniques are in the use of market basket
analysis for business intelligence activities. The data mining techniques are
supportive of the data processing in the M2M and other open databases. The
research work is based on the analysis of market basket analysis by the use of
various algorithms and data processing techniques. The research paper concluded
that massive data distribution channels provide real-time data analysis opportunities
to reach conclusively argument. Following researchers it is essential for the
data analysis and mining processes, to create some standard formats to
interface and deal with the data provided by the electronic databases such as
local open databases, external open databases, local proprietary databases, and
point of sales (POS). In accordance with this research work, a local
proprietary database for market basket analysis is Magento which contains large
scale data about various variables of interest for the researchers. Moreover,
the researcher agrees that a fine example of massive data imported in the big
data systems for the research purpose is known as Cassandra DB. Following their
findings, market basket analysis suggests the organizational management
regarding the correct positioning of their products and sales of these products
in the targeted market by the use of real time data of the consumer market.
(Galiano, et al., 2016)
Summarizing the research outcomes and conclusion of
Vairagade, Shah, Chavan, and Bhatt held in 2016, market basket analysis can be
implemented in the organizations by the use of the Hadoop framework. In this
research study, market basket analysis is considered a technique used in data
mining and analysis process for the identification of items that have more
probability of purchase together. In simple words, market basket analysis
focuses on the items that are frequently purchased by the consumers with
another product or service. The researchers were mainly focused on the
identification of an association between different pairs of items available in
a store based on the transactions record. The prime objective of this market
basket analysis is to provide real time support to the retailers and business
owners to understand the behavior of their customers. Based on the research
findings researchers agree that market basket analysis based on big data can be
beneficial for the retailer and business organizations in the improvement of decision-making
process about the start or promotion of a new product or service in their
business. (Variegate, Shah, Chavan, & Bhatt, 2016)
According to the review of an empirical research work
conducted by the Thanmayee in 2017, the datasets generated in various field of
life is providing support in the decision-making process. Moreover, collection
of these datasets creates big data for the organizations that assist the
research and development process in the organizations such as medical
institutes, educational institutes, and retailing sectors. For instance,
information collected from the market centers and consumer market provides a
database for business experts. Business world centers use such big data to make
decisions about the requirement of customer services and for the satisfaction
of customers to meet the different algorithms in the work (Thanmayee, 2017).
With the approach of PC system to analysis and its hidden
capacity, the digitalization of every single medical examination and medical
report in the social insurance system has become a typical and generally
accepted the performance in now days. There are several techniques of
collecting information by the executives and analytical management are in
effect constantly grew particularly for constant information dropping, catch,
collection, investigation, and representation of the solution that can help to
coordinate a superior consumption of EMRs with the social insurance. On this
level of research, human proficient professionals are liable for presenting
different kind of data as restorative history as they have to collect it,
beneficial and clinical information and people who are connected with the
medical field (Dimitrov, 2016).
Big data processing is used in the large size data group to
checking the commodities which are used in the problem-solving methods of the
problems regarding customer’s issues with the help of data managing techniques.
Current information systems, for example, Spark, have been extremely effective
at diminishing the necessary measure of code to make a particular application.
Future information serious system APIs will keep on improving in four key
regions, discovery of increasingly ideal schedules to clients, allowing
straightforward access to unique information sources, the utilization of
graphical UIs (GUI) and allowing between heterogeneous equipment assets.
It is commonly known that the field of any work, mainly in
the business industry, normally, the incredible determine of information
created by the business experts which is away as a prescribed manner. This
information has capability of a wide scope of business market and beneficial
abilities. The digitalization of such information is called Big Data. The
aggregate information that is recognized with quiet business to satisfy the
customers and wealth generated huge information. In 201, McKinsey report
estimated that the social industry might understand $300 billion on annual
basis incentive by consuming huge information (Baro, Degoul, Beuscart, &
Chazard, 2015) in the market while adopting the batter use of the data mining
in the market.
It helps the different business sectors to work in the
management of the market terms in different situation with effective measures.
It also implemented to keep the records of work of projects for a long period
of time. All the details about the company or business as well as of customers
records and experiments are kept under the big data collection which is
supervised by the effective software, so it is helpful to overcome the chances
of fraud and there may efficient security of data which could help the experts
to detect the fraud if there is any misplacement of data in the business. Data
which is used in huge form to store the data in efficient way is better
techniques to examine the data. Business experts are making decisions according
to the arrangements to analyze the data for decision making about different
terminologies by analyzing the huge market data with the help of data mining.
III. open issues of Data Mining and Big
Data
There are several
techniques of collecting information by the executives and analytical
management are in effect constantly grew particularly for constant information
dropping, catch, collection, investigation, and representation of the solution
that can help to coordinate a superior consumption of EMRs with the social
insurance. In any case, it could be said that the business services has gone
into a post-EMR organization stage. Currently, the most important target is to
increase significant bits of knowledge from these vast measures of information
gathered as EMRs (Feldman, Martin, & Skotnes, 2012). Here, we examine a
portion of these difficulties to collect things on a single platform in the
market. Followings are the challenges or issues which could be faced by the
marketing expertise while using big data for examining the problem with rule
mining of data in the market
IV. Storage capacity of Data Mining and
Big Data
There are many organizations which give priority to place the
data in their own way so that there may not be theft of data sharing or hacking
but for this purpose there must be large storage capacity which is not possible
in big data. As the data collected in a huge limit there is problem of handling
data in proper way. Apparently with decreasing cost and increasing efficiency
of data, the cloud-based capacity is utilized in IT system which is a better
choice in the vast majority of the social insurance organization.
·
Clean
up of Data Mining and
Big Data
The data collected is required to be cleaned to get the
guarantee of accuracy, correctness, reliability, significance, and desirable
quality after getting it. The cleanup may be automatic or may be manual by
implementing efficient rules to ensure of important stages of accuracy and
honesty. Gradually refined and exact instruments are used in AI procedures to
decrease effort and price to overcome the chances of being fake data from
crashing large amount of data of market situation.
·
Format
of Data Mining and Big Data
There is a huge amount of data in the market which is not
easy to understand by experts with normal data collection measures, because it
is not considered enough effective. It is a complex task to consider processing
of data especially when it deals with the relation of business or any felid
services of the experts. There is requirement to arrange market information to
relevant figures with ultimate objective of a specific case. It could be
implemented in any case which is particulars to deal.
·
Accuracy
of Data Mining and Big Data
Several studies discussed that there, difficult task
methodology, and a collaborated with the fact of why huge information is
extraordinarily essential to examine in proper way. Each of these the member
could add value in the problem solving for several data figure which deals in
difficult time of its lifecycle. The bid data and data mining are used to get
better quality of data and communication which is used to represents the
history in detail disparities in these specific conditions.
·
Processing of data of Data Mining and Big Data
Different researches have discussed the elements that could
be altered according to the requirements of data efficiency and false
interpretation about business reports. To overcome of disorder, payment olden
times, altering complexity of achieved image efficiency and alter the situation
which could be actual cause of profit. As data is huge in amount so there is
difficulty in the processing of image in the market.
·
Safety issues of Data Mining and Big Data
Several safety issues in data collection, such as hackings,
attack, and there may be chances of theft, therefore information security need
for social insurance relations. Consequent to watching a number of mishandling
and poor working of professionals to make a boundary can protect human data in
huge form. These principles are known as Security Rules that are help to
instruct relations with implementing a little effort, honesty, and examining.
Normal security measures such as something like date against disease
programming, firewalls, scrambling sensitive data, and verification could be a
deal of difficulty.
·
Competency
of Data Mining and Big Data
To implement a useful sketch, it is required to have complete
and modern data which is used to get information. The data collected with the
resources which are competent according to making and collecting data time
which is liable for the research of efficient expertise. This will allow
examiners to reproduce previous questions and help out in further research in
the final results. This builds the convenience of data and anticipating
creation of "in sequence dumpsters" of minimum consumption of data.
·
Visualization
of Data Mining and Big Data
A clean representation of data through maps and diagrams to
define various techniques of data is used to overcome the deficiency of
arrangement in order of data. Various techniques are represented via bar
graphs, pie diagrams, and scatter with the specific methods to solve the data.
In business and marketing data visualization is used in the presentations of
the market situation.
V. Conclusion of Data Mining and Big Data
It is concluded that the data associated with customer
reviews and products sales are easily available to companies in the form of big
data that enable them to make the right decisions regarding their future
business operations and policies development. It works in accordance with the
key concept of frequency of occurrence between two or more than two different
events. . In this kind of algorithm, data can be classified in different groups
and clusters regarding various dimensions and related associated data sets.
Using this algorithm and association rules many variables containing large
datasets can further provide strong support as well as to detect important
information by the use of other data mining techniques. The prime objective of
this market basket analysis is to provide real time support to the retailers
and business owners to understand the behavior of their customers. Big data
processing is used in the large size data group to checking the commodities
which are used in the problem solving methods of the problems regarding
customer’s issues with the help of data managing techniques. There may be
problem of storage capacity, accuracy, presentations and other challenges to
meet the requirements of the data to meet the requirements but it is still a better
to implement the data in different measures of the data analysis.
VI. References of Data Mining and Big Data
[1] Reina Reyes
and Sheena Valenzuela, "Shopping for Politicians: Insights from Market
Basket Analysis of Senatoriables," Building Inclusive Democracies in
ASEAN, pp. 333-345, 2019.
[2] Angelo
Galiano et al., "Machine to Machine (M2M) Open Data System for Business
Intelligence in Products Massive Distribution oriented on Big Data,"
Angelo Galiano et al, / (IJCSIT) International Journal of Computer Science and
Information Technologies, vol. 7, no. 3, pp. 1332-1336, 2016.
[3] Rupali S.
Vairagade, Tejas Shah, Tejas Chavan, and Rohan Bhatt, "Survey on
Implementation of Market Basket Analysis using Hadoop Framework,"
International Journal of Computer Applications, vol. 134, no. 10, pp.
0975-8887, 2016.
[4] Manjunath
Prasad Thanmayee, "Revamped Market-Basket Analysis using In-Memory
Computation framework," IEEE, pp. 65-70, 2017.
[5] Dimiter V.
Dimitrov, "Medical internet of things and big data in healthcare.,"
Healthcare informatics research, vol. 22, no. 3, pp. 156-163., 2016.
[6] Emilie Baro,
Samuel Degoul, Régis Beuscart, and Emmanuel Chazard, "Toward a
literature-driven definition of big data in healthcare.," BioMed research
international, 2015.
[7] Bonnie
Feldman, Ellen M. Martin, and Tobi Skotnes, "Big data in healthcare hype
and hope.," Dr. Bonnie, vol. 360, pp. 122-125., 2012.
[8] Wullianallur
Raghupathi and Viju Raghupathi, "Big data analytics in healthcare: promise
and potential.," Health information science and systems, vol. 2, no. 1, p.
3, 2014.
[9] Min Chen,
Yixue Hao, Kai Hwang, Lu Wang, and Lin Wang, "Disease prediction by
machine learning over big data from healthcare communities. ," Ieee
Access, 5, 8869-8879., 2017.
[10] Galit Shmueli
and Otto R. Koppius, "Predictive analytics in information systems
research. ," MIS quarterly, pp. 553-572., 2011.
[11] Eric. Siegel,.
Predictive analytics: The power to predict who will click, buy, lie, or die.:
John Wiley & Sons., 2013.
[12] Nishita Mehta
and Anil Pandit, "Concurrence of big data analytics and healthcare: A
systematic review.," International journal of medical informatic, vol.
114, pp. 57-65., 2018.
[13] D. P. Acharjya
and Ahmed P Kauser , "A Survey on Big Data Analytics: Challenges, Open
Research Issues and Tools," (IJACSA) International Journal of Advanced
Computer Science and Applications, vol. 7, no. 2, 2016.
[14] Chaowei Yang,
Qunying Huang, Zhenlong Li, Kai Liu, and Fie Hu, "Big Data and cloud
computing: innovation opportunities and challenges," International Journal
of Digital Earth, vol. 10, no. 1, pp. 13-53, 2017.
[15] Changqing Ji,
YU LI, WENMING QIU, and YINGWEI JIN, "Big data processing: Big
challenges," Journal of Interconnection Networks, vol. 13, no. 3, 2013.
[16] Manish Kumar
Kakhani, Sweeti Kakhani, and S. R. Biradar, "Research Issues in Big Data
Analytics," International Journal of Application or Innovation in
Engineering & Management, vol. 2, no. 8, 2013.
[17] Xiaolong Jin,
Benjamin W Wah, Xueqi Cheng, and Yuanzhuo Wang, "Significance and
Challenges of Big Data Research," Big Data Research, pp. 1-6, 2015.
[18] Marcos D.
Assunção, Rodrigo N. Calheiros, Silvia Bianchi, Marco A.S. Netto, and Rajkumar
Buyya, "Big Data computing and clouds: Trends and future directions,"
Journal of Parallel and Distributed Computing, vol. 79-80, pp. 3-15, 2015.
[19] Geetika
Chawla, Savita Bamal, and Rekha Khatana, "Big Data Analytics for Data
Visualization: Review of Techniques," International Journal of Computer
Applications, vol. 182, no. 21, 2018.
[20] Roberta
Pastorino et al., "Benefits and challenges of Big Data in healthcare: an
overview of the European initiatives," European Journal of Public Health,
vol. 29, no. 3, pp. 23–27, 2019.