In the
digital era, operators of mobile network are beginning to face different types of
challenges with high demands. Considering the fact that operators of mobile
networks are recognized as a source or hub of big data, the old and traditional
methods are not effective with the period of fifth generation, internet of
things, and big data because effective management of different datasets has
become a necessary task for operators with the consistent development and
expansion of data and moving to fifth generation from long-term evolution.
Therefore, there is a significant need for analytics of big data for predicting
future network performance, traffic, and performance for fulfilling the
requirements associated with the fifth generation. The techniques of data
science with the use of algorithms of deep learning and machine learning are
introduced in this paper. RNN or recurrent neural network and ARIMA or
autoregressive integrated moving average, and Bayesian-based curve fitting are
utilized for introducing any application driven by data to operators of mobile
network. In models, the main framework includes identification parameters of prediction,
estimation, model, and this prediction’s data-driven application from
applications of network performance and business. These models are implemented to
the CDRs or call detail record datasets of Telecom Italia. The capability of
these models is determined with the use of a renowned evaluation criterion. It
is determined that ARIMA model is more precise as a specific predictive model
than an RNN.
Keywords: Fifth
generation, machine learning, analytics of big data.
I.
INTRODUCTION of Big
Data in Telecom Industry Effective Predictive Techniques on CDRs
Operators
of mobile networks began to move to fifth generation from fourth generation
which is an upcoming and promising solution for meeting the requirements of
wireless broadband. Additionally, they have started looking for some innovative
solutions for facing challenges and providing a satiable customer experience
with the management of complex network by efficient backhaul resource handing [1].
Telecom organizations and researchers have been studying a diversity of
techniques for big data management in an effective manner for discovering
unknown knowledge and patterns from the collected information obtained from
operators and help organizations in providing smart services for achieving
reduced expenditure and resources.
The
main objective, in this paper, is concerned with investigating the analysis and
application driven by big data in the telecommunication industry with respect
to operators of mobile networks for fifth generation and current networks in
their operational and business aspects. Implementing different techniques
driven by big data on data gathered from a telecommunication network and
applying different models of prediction for predicting traffic. And at the end,
how different results and applications are brought by big data analytics in
comparison with traditional methods. In addition, it will also be discussed how
they are beneficial for business and operational activities, companies, and how
this can be utilized and in which types of applications.
II.
ANALYTIC TOOLS AND DATA SOURCES FOR TELECOMS
A.
Telecom Data Sources
A
source and carrier of big data is formed by operators of mobile networks
because penetration of mobile users has increased significantly [2] and
traditional techniques were utilized by organizations before transactions from
the analytics of big data. Less attention is paid by these techniques to
operational data and they don’t concentrate significantly on transactional
data. The analytics of big data are important in a number of ways in comparison
with traditional methods. For instance, the compressor transmits data and
useful data are defined by the analytics of big data (He et al., 2016). In a
large part of application, decision-making in real time is a benefit of using
analytics of big data by monitoring development and infrastructure of network
performance. A number of smart services will be supported and provided by MNOs
with the analysis of sources and types of data.
[3]
Classifies sources of data for telecoms as operator and subscriber data,
external and internal data sources [4], core network levels, cell, subscriber,
and KPI deep classification for different networks [5]. When it comes to
analytic tools, some of the main tools as defined by the previous studies
include methods of machine learning modeling, data mining, and statistical
modeling [6]. Actually, with current development and improvement in data
analytics, networks on the basis of big data have formed an attractive area of
research for numerous researchers around the globe [8], [7]. Additionally, in
the industrial sector, researchers recently developed and studied frameworks
for big data management in an efficient manner in mobile networks.
B.
Contribution of CDRs or Call Details Records
In
mobile operators, CDRs were considered important in for finical aspects.
However, in the period of big data, applications driven by it are obtaining
attention by researchers in industrial and scientific aspects because datasets
of CDRs are full of information associated with communication among numerous users
along with how, when, and with whom they are communicating.
The
analysis of CDRs datasets has become quite a significant and interesting
research area [9] because numerous uses associated with these datasets are
provided by it for different purposes of research resulting in the improvement
of dataset management techniques, development of analytic techniques, and
analysis types from a number of perspectives with the use of big data methods. When
it comes to telecom operators, Orange is recognized as one of the biggest and
the first challenge “D4D Challenge” was launched in 2013. Different candidates
were invited by them through this challenge from around the globe. In addition
to it, access was provided to massive datasets of CDRs for developing
objectives of their customer satisfaction and infrastructures as a source of
gaining more revenues. Successful outcomes were resulted by scientific work
which encouraged the organization for launching a second challenge during the
mobile conference of NET in April 2015[9]. In Europe, Telecom Italia is also a
recognized mobile operator which faces the same challenges of big data and in
2014, Big Data Challenge’s first edition was launched by it [10].
III.
TECHNIQUES AND METHODOLOGY of Big Data in Telecom Industry
Effective Predictive Techniques on CDRs
In
the analysis of these datasets, different techniques and methods are utilized.
Some of the techniques utilized in this work include data visualization,
prediction, and clustering. The framework was followed by us for obtaining the optimum
outcomes from datasets. Preprocessing is the first step and it is considered an
important step while using massive data, and in understanding the hidden
patterns existing in the data. The next step is concerned with defining
analysis type and necessary tools for it, the application type driven by it,
and which type of information might be needed for it. Finally, on the basis of
results, best applications are determined for this analysis.
A.
Data Set
Millions
of records are included in a dataset between December and November 2013. In
2014, these datasets were a component of the Big Data Challenge of Telecom
Italia. It was quite ironic and included different types of telecommunications
including electricity data, weather forecasting, news, and social networking. An
original dataset has been formed by Telecom Italia with connotation of some
specific labs. The institutes included in them are:
-
Fondazione Bruno Kessler.
-
EIT ICT Labs.
-
Trento and Trento RISE Institute.
-
Milan Polytechnic University.
-
MIT Media Labs.
Before
the first dataset is released, the attention of partakers is considered. The
demand is nevertheless being increased at the competition’s end for datasets
which has become an initiative or measure towards “Open Big Data.” Datasets, in
accordance with [10], were freely published for improving the dataset usage in
the society.
It
can be said that this dataset is consequence of evaluation or calculation upon
the call detail records which were produced by Telecom Italia in the Milano
City. CDRs record user activities for billing and network management but our
research focuses on the use of dataset for different applications rather than
utilizing it for the traditional activity.
B.
Methods
In this section, the adopted methods
are explained:
·
Clustering: Clustering procedures, in
the data mining field, constitute some important methods [11] due to their
significant high abilities for deducing connections among different data
objects. Scientists have largely utilized them for investigating datasets for
the tracing of mobile. On different networks acquired from mobile networks,
K-means is implemented the most and in others works including [13] and [12], it
provides sufficient results.
·
Prediction: For mobile operators, it is
considered important in taking decisions associated with network optimization,
and as a part of ML. ARIMA model is one of the most renowned algorithms of
prediction as explained in [14]. It is significant for time series data in a
both static and practical manner. RNN model is another model while the one with
a number of layers on the basis of short and long-term memory is referred to as
LSTM. It consists of memory blocks and it can be trained with the use of black
propagation. In this model, the issue of gradient is gradually decreased [15]. Both
ARIMA and RNN are performed in a better manner in comparison with others for
time series prediction [16].
C.
Analysis of Data
Generally,
the base of our analysis is the data-intensive approach and different
techniques of machine learning are applied on datasets of CDRs because it
contributes to the value of both business and scientific aspects. Three
analyses have been performed in our work:
First
analysis: The highest daily activity is identified in this analysis
during a specific day. In addition to it, peak hours within a day are also
identified. The first analysis’s results were derived with respect to total and
time activity while peak hours are 11, 10, and 9 AM while 3 AM is not a peak
activity hour.
In
business aspects and network development, this result is quite beneficial
because it will aid in the identification of which areas needs to be developed
or requires more resources. It will also help in determined which country code
or square grid develops more traffic due to which more revenues are gained by companies
by targeting customers on the basis of their geo-location. Additionally, with
the resource management, it decreases its costs and expenses.
Second
analysis: This analysis compares and illustrates the weekly usage of
internet in November for three ID cells portraying different areas for
categories in the city of Milan. It also included nightlife area, universities
area, and downtown area. It was indicated by the results that the downtown
area’s peak is earlier than that of nightlife, phone calls are less in
universities area on the weekends, and a decrease was experienced in the volume
of calls.
In
optimization and resource allocation, these observations will help by defining
which area is fully loaded and at what time and it can help in defining
temporary solutions for different peak hours such as the deployment of Pico
cell.
Third
analysis: In this analysis, three methods are implemented for
prediction and modeling based on the internet usage. ARIMA model is the first
one, LSTM is the second model, and the last model is developed on the model
which was utilized in the Kaggle Competition. This model was validated on
different types of data on a weekly basis for determining if modeling for a
week is efficient enough for having similar results and whether it can be
implemented on datasets which are collected different time intervals.
·
ARIMA
For the datasets of one week, the applied model is ARIMA (2,
1, 0).Three ID cells will be focused upon first for the main regions and the
obtained results are portrayed in the figures below.
Fig 1:
For 4456 Cell ID, Traffic Prediction on an Hourly Basis using ARIMA
Figure
2: For 5060 Cell ID, Internet Traffic Hourly Prediction using ARIMA
Moving
on, 9998 cells were the target as illustrated in the third figure.
Figure
3: For all cells, Internet Traffic Hourly Prediction using ARIMA
·
LSTM
One input is included in this model for four blocks and visible
layer in the hidden layer. Meanwhile, in the output layer, there is a single
input. Internet traffic prediction is shown in figure 4 for 4456 cell ID on a
weekly basis.
Figure 4: For 4456 cell ID, Internet Traffic Hourly
Prediction using ARIMA
·
Third Prediction Model
In the Kaggle competition, this model was utilized where it
was implemented on a number of periods in contrast with out information. Generally,
it is based on a number of datasets which are periodically set every
twenty-four hours. Meanwhile, SIN behavior is exhibited by the internet traffic
as portrayed in the figure 5.
Figure 5: Downtown Area Results of Internet Traffic
Moving
on, this model is implemented on three areas which are categorized from our
analysis. Prediction results for nightlife and downtown are represented in
figure 6 for the area of universities in the figure 7.
Figure
6: Nightlife and Downtown Areas and Internet Traffic Data
Figure
7: Universities Area and Internet Traffic Data
Three
models were applied for the prediction of internet traffic on the basis of
hourly and weekly data. It was explained by results that the prediction model
of ARIMIA is precise for the selected cells and with 30 percent test set and 70
percent data set. It was recognized that 21 percent of test sets and 69 percent
of training sets were not sufficient enough in cell/data ID. The obtained
results, for third model, it was indicated by the obtained results that this
model is accurate and suitable for all the selected datasets with the
university area being an exception. This area still has some issues and it
might be associated with the mobility of community patterns. The same
conclusion as pervious works was obtained for different dataset periods. Thus,
it was determined that this model was suitable for all datasets.
It
has been indicated by results that the application of predictive models and
intelligent data analysis for the prediction of traffic are considered
significant and a vital role is played by them for mobile operators which will
be quite useful in the routing of traffic. It can indicate yearly prediction as
well for supporting network optimization, resource allocations, self-organizing
networks, and investment planning.
IV.
DISCUSSION of Big Data in Telecom Industry Effective Predictive
Techniques on CDRs
For
MNOs, this research is dedicated for big data management in an efficient manner
in the sector of data-driven apps and telecommunication sector. Comprehending
the available data, which analytic tools are eligible and must be implemented,
and which type of information or data should be collected are significant for
any provider of service for harvesting best results from the data.
Big
data is selected and applied in this work and it is important to recognize that
techniques of machine learning contribute significantly to both industrial and
academic sector. It has been proven by this practical work how benefits in the
business and operational aspect of telecommunication industry can be obtained
with the effective application of techniques of Big Data instead of traditional
techniques. Models like LSTM and ARIMA were applied for the prediction of
traffic and it was explained that results were quite beneficial in strategic
and short plans for operator. For the performance of our practical part, CDR
database selection was based on the significance of the dataset for the MNO
since it is indicated by our results that that CDRs analysis has a lot of
significance beyond and currently in different areas like investment plans on
the basis of optimization network, fault detection, traffic prediction, network
optimization, and resource allocation.
ACKNOWLEDGMENT of Big Data in Telecom
Industry Effective Predictive Techniques on CDRs
This
research is developed on the basis of a master thesis. I would like to thank my
parents and academic supervisors along with my lecturers and professors in
conducting this research.
References of Big Data in Telecom Industry
Effective Predictive Techniques on CDRs
[1] Zeng, D., Gu, L., & Guo, S. (2015). Cost
minimization for big data processing in geo-distributed data centers. In Cloud
networking for big data (pp. 59-78).Springer, Cham.
[2] Bi, S., Zhang, R., Ding, Z., & Cui, S. (2015).
Wireless communications in the era of big data. IEEE communications magazine,
53(10), 190-199.
[3] Zheng, K., Yang, Z., Zhang, K., Chatzimisios, P.,
Yang, K., & Xiang, W. (2016).Big data-driven optimization for mobile
networks toward 5G. IEEEnetwork, 30(1), 44-51.
[4] He, Y., Yu, F. R., Zhao, N., Yin, H., Yao, H., &
Qiu, R. C. (2016). Big data analytics in mobile cellular networks. IEEE access,
4, 1985-1996..
[5] Imran, A., Zoha, A., & Abu-Dayya, A. (2014).
Challenges in 5G: how to empower SON with big data for enabling 5G. IEEE
network, 28(6), 27-33.
[6] Boccardi, F., Heath, R. W., Lozano, A., Marzetta, T.
L., & Popovski, P. (2014).Five disruptive technology directions for 5G.
IEEE Communications Magazine, 52(2), 74-80.
[7] Samulevicius, S., Pedersen, T. B., & Sorensen, T.
B. (2015, May). MOST: Mobile broadband network optimization using planned
spatio-temporal events.In Vehicular Technology Conference (VTC Spring), 2015
IEEE 81st (pp. 1-5).
[8] Ramaprasath, A., Srinivasan, A., & Lung, C. H.
(2015, May). Performance optimization of big data in mobile networks. In Electrical
and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on (pp.
1364-1368).
[9] Blondel,
V. D., Decuyper, A., & Krings, G. (2015). A survey of results on mobile
phone datasets analysis. EPJ Data Science, 4(1), 10.
[10] Italia,
T. (2014). Telecom Italia big data challenge. URL〈https://dandelion.eu/datamine/open-big-data/〉.
[11] Xu, R., & Wunsch, D. (2005). Survey of clustering
algorithms. IEEE Transactions on neural networks, 16(3), 645-678.
[12] Liu, J., Chang, N., Zhang, S., & Lei, Z. (2015).
Recognizing and characterizing dynamics of cellular devices in cellular data
network through massive data analysis. International Journal of
Communication Systems, 28(12), 1884-1897.
[13] Soto, V., & Frías-Martínez, E. (2011, June).
Automated land use identification using cell-phone records. In Proceedings
of the 3rd ACM international workshop on MobiArch (pp. 17-22).
[14] Zhang, G. P. (2003). Time series forecasting using a
hybrid ARIMA and neural network model. Neurocomputing, 50,
159-175.
[15] Sundermeyer, M., Schlüter, R., & Ney, H. (2012).
LSTM neural networks for language modeling. In Thirteenth Annual Conference of
the International Speech Communication Association.
[16]
Ho, S. L., Xie,
M., & Goh, T. N. (2002). A comparative study of neural network and
Box-Jenkins ARIMA modeling in time series prediction. Computers &
Industrial Engineering, 42(2-4), 371-375.
[17] Daniel, B. K. (2017). Contestable professional
academic identity of those who teach research methodology. International
Journal of Research & Method in Education, 1-14.