Loading...

Messages

Proposals

Stuck in your homework and missing deadline?

Get Urgent Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework Writing

100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Report on Big Data in Telecom Industry Effective Predictive Techniques on CDRs

Category: Computer Sciences Paper Type: Report Writing Reference: IEEE Words: 3100

Abstract of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

  In the digital era, operators of mobile network are beginning to face different types of challenges with high demands. Considering the fact that operators of mobile networks are recognized as a source or hub of big data, the old and traditional methods are not effective with the period of fifth generation, internet of things, and big data because effective management of different datasets has become a necessary task for operators with the consistent development and expansion of data and moving to fifth generation from long-term evolution. Therefore, there is a significant need for analytics of big data for predicting future network performance, traffic, and performance for fulfilling the requirements associated with the fifth generation. The techniques of data science with the use of algorithms of deep learning and machine learning are introduced in this paper. RNN or recurrent neural network and ARIMA or autoregressive integrated moving average, and Bayesian-based curve fitting are utilized for introducing any application driven by data to operators of mobile network. In models, the main framework includes identification parameters of prediction, estimation, model, and this prediction’s data-driven application from applications of network performance and business. These models are implemented to the CDRs or call detail record datasets of Telecom Italia. The capability of these models is determined with the use of a renowned evaluation criterion. It is determined that ARIMA model is more precise as a specific predictive model than an RNN.

Keywords: Fifth generation, machine learning, analytics of big data.

I.  INTRODUCTION of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

Operators of mobile networks began to move to fifth generation from fourth generation which is an upcoming and promising solution for meeting the requirements of wireless broadband. Additionally, they have started looking for some innovative solutions for facing challenges and providing a satiable customer experience with the management of complex network by efficient backhaul resource handing [1]. Telecom organizations and researchers have been studying a diversity of techniques for big data management in an effective manner for discovering unknown knowledge and patterns from the collected information obtained from operators and help organizations in providing smart services for achieving reduced expenditure and resources.

The main objective, in this paper, is concerned with investigating the analysis and application driven by big data in the telecommunication industry with respect to operators of mobile networks for fifth generation and current networks in their operational and business aspects. Implementing different techniques driven by big data on data gathered from a telecommunication network and applying different models of prediction for predicting traffic. And at the end, how different results and applications are brought by big data analytics in comparison with traditional methods. In addition, it will also be discussed how they are beneficial for business and operational activities, companies, and how this can be utilized and in which types of applications.

II. ANALYTIC TOOLS AND DATA SOURCES FOR TELECOMS

A. Telecom Data Sources

A source and carrier of big data is formed by operators of mobile networks because penetration of mobile users has increased significantly [2] and traditional techniques were utilized by organizations before transactions from the analytics of big data. Less attention is paid by these techniques to operational data and they don’t concentrate significantly on transactional data. The analytics of big data are important in a number of ways in comparison with traditional methods. For instance, the compressor transmits data and useful data are defined by the analytics of big data (He et al., 2016). In a large part of application, decision-making in real time is a benefit of using analytics of big data by monitoring development and infrastructure of network performance. A number of smart services will be supported and provided by MNOs with the analysis of sources and types of data.

[3] Classifies sources of data for telecoms as operator and subscriber data, external and internal data sources [4], core network levels, cell, subscriber, and KPI deep classification for different networks [5]. When it comes to analytic tools, some of the main tools as defined by the previous studies include methods of machine learning modeling, data mining, and statistical modeling [6]. Actually, with current development and improvement in data analytics, networks on the basis of big data have formed an attractive area of research for numerous researchers around the globe [8], [7]. Additionally, in the industrial sector, researchers recently developed and studied frameworks for big data management in an efficient manner in mobile networks.

B. Contribution of CDRs or Call Details Records

In mobile operators, CDRs were considered important in for finical aspects. However, in the period of big data, applications driven by it are obtaining attention by researchers in industrial and scientific aspects because datasets of CDRs are full of information associated with communication among numerous users along with how, when, and with whom they are communicating.

The analysis of CDRs datasets has become quite a significant and interesting research area [9] because numerous uses associated with these datasets are provided by it for different purposes of research resulting in the improvement of dataset management techniques, development of analytic techniques, and analysis types from a number of perspectives with the use of big data methods. When it comes to telecom operators, Orange is recognized as one of the biggest and the first challenge “D4D Challenge” was launched in 2013. Different candidates were invited by them through this challenge from around the globe. In addition to it, access was provided to massive datasets of CDRs for developing objectives of their customer satisfaction and infrastructures as a source of gaining more revenues. Successful outcomes were resulted by scientific work which encouraged the organization for launching a second challenge during the mobile conference of NET in April 2015[9]. In Europe, Telecom Italia is also a recognized mobile operator which faces the same challenges of big data and in 2014, Big Data Challenge’s first edition was launched by it [10].

III. TECHNIQUES AND METHODOLOGY of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

In the analysis of these datasets, different techniques and methods are utilized. Some of the techniques utilized in this work include data visualization, prediction, and clustering. The framework was followed by us for obtaining the optimum outcomes from datasets. Preprocessing is the first step and it is considered an important step while using massive data, and in understanding the hidden patterns existing in the data. The next step is concerned with defining analysis type and necessary tools for it, the application type driven by it, and which type of information might be needed for it. Finally, on the basis of results, best applications are determined for this analysis.

A. Data Set

Millions of records are included in a dataset between December and November 2013. In 2014, these datasets were a component of the Big Data Challenge of Telecom Italia. It was quite ironic and included different types of telecommunications including electricity data, weather forecasting, news, and social networking. An original dataset has been formed by Telecom Italia with connotation of some specific labs. The institutes included in them are:

-          Fondazione Bruno Kessler.

-          EIT ICT Labs.

-          Trento and Trento RISE Institute.

-          Milan Polytechnic University.

-          MIT Media Labs.

Before the first dataset is released, the attention of partakers is considered. The demand is nevertheless being increased at the competition’s end for datasets which has become an initiative or measure towards “Open Big Data.” Datasets, in accordance with [10], were freely published for improving the dataset usage in the society.

It can be said that this dataset is consequence of evaluation or calculation upon the call detail records which were produced by Telecom Italia in the Milano City. CDRs record user activities for billing and network management but our research focuses on the use of dataset for different applications rather than utilizing it for the traditional activity.

B. Methods

            In this section, the adopted methods are explained:

·         Clustering: Clustering procedures, in the data mining field, constitute some important methods [11] due to their significant high abilities for deducing connections among different data objects. Scientists have largely utilized them for investigating datasets for the tracing of mobile. On different networks acquired from mobile networks, K-means is implemented the most and in others works including [13] and [12], it provides sufficient results.

·         Prediction: For mobile operators, it is considered important in taking decisions associated with network optimization, and as a part of ML. ARIMA model is one of the most renowned algorithms of prediction as explained in [14]. It is significant for time series data in a both static and practical manner. RNN model is another model while the one with a number of layers on the basis of short and long-term memory is referred to as LSTM. It consists of memory blocks and it can be trained with the use of black propagation. In this model, the issue of gradient is gradually decreased [15]. Both ARIMA and RNN are performed in a better manner in comparison with others for time series prediction [16].

C. Analysis of Data

Generally, the base of our analysis is the data-intensive approach and different techniques of machine learning are applied on datasets of CDRs because it contributes to the value of both business and scientific aspects. Three analyses have been performed in our work:

First analysis: The highest daily activity is identified in this analysis during a specific day. In addition to it, peak hours within a day are also identified. The first analysis’s results were derived with respect to total and time activity while peak hours are 11, 10, and 9 AM while 3 AM is not a peak activity hour.

In business aspects and network development, this result is quite beneficial because it will aid in the identification of which areas needs to be developed or requires more resources. It will also help in determined which country code or square grid develops more traffic due to which more revenues are gained by companies by targeting customers on the basis of their geo-location. Additionally, with the resource management, it decreases its costs and expenses.

Second analysis: This analysis compares and illustrates the weekly usage of internet in November for three ID cells portraying different areas for categories in the city of Milan. It also included nightlife area, universities area, and downtown area. It was indicated by the results that the downtown area’s peak is earlier than that of nightlife, phone calls are less in universities area on the weekends, and a decrease was experienced in the volume of calls.

In optimization and resource allocation, these observations will help by defining which area is fully loaded and at what time and it can help in defining temporary solutions for different peak hours such as the deployment of Pico cell.

Third analysis: In this analysis, three methods are implemented for prediction and modeling based on the internet usage. ARIMA model is the first one, LSTM is the second model, and the last model is developed on the model which was utilized in the Kaggle Competition. This model was validated on different types of data on a weekly basis for determining if modeling for a week is efficient enough for having similar results and whether it can be implemented on datasets which are collected different time intervals.

·         ARIMA

For the datasets of one week, the applied model is ARIMA (2, 1, 0).Three ID cells will be focused upon first for the main regions and the obtained results are portrayed in the figures below.


Fig 1: For 4456 Cell ID, Traffic Prediction on an Hourly Basis using ARIMA


Figure 2: For 5060 Cell ID, Internet Traffic Hourly Prediction using ARIMA

Moving on, 9998 cells were the target as illustrated in the third figure.


Figure 3: For all cells, Internet Traffic Hourly Prediction using ARIMA

·         LSTM

One input is included in this model for four blocks and visible layer in the hidden layer. Meanwhile, in the output layer, there is a single input. Internet traffic prediction is shown in figure 4 for 4456 cell ID on a weekly basis.


Figure 4: For 4456 cell ID, Internet Traffic Hourly Prediction using ARIMA

·         Third Prediction Model

In the Kaggle competition, this model was utilized where it was implemented on a number of periods in contrast with out information. Generally, it is based on a number of datasets which are periodically set every twenty-four hours. Meanwhile, SIN behavior is exhibited by the internet traffic as portrayed in the figure 5.


Figure 5: Downtown Area Results of Internet Traffic

Moving on, this model is implemented on three areas which are categorized from our analysis. Prediction results for nightlife and downtown are represented in figure 6 for the area of universities in the figure 7.


Figure 6: Nightlife and Downtown Areas and Internet Traffic Data


Figure 7: Universities Area and Internet Traffic Data

Three models were applied for the prediction of internet traffic on the basis of hourly and weekly data. It was explained by results that the prediction model of ARIMIA is precise for the selected cells and with 30 percent test set and 70 percent data set. It was recognized that 21 percent of test sets and 69 percent of training sets were not sufficient enough in cell/data ID. The obtained results, for third model, it was indicated by the obtained results that this model is accurate and suitable for all the selected datasets with the university area being an exception. This area still has some issues and it might be associated with the mobility of community patterns. The same conclusion as pervious works was obtained for different dataset periods. Thus, it was determined that this model was suitable for all datasets.

It has been indicated by results that the application of predictive models and intelligent data analysis for the prediction of traffic are considered significant and a vital role is played by them for mobile operators which will be quite useful in the routing of traffic. It can indicate yearly prediction as well for supporting network optimization, resource allocations, self-organizing networks, and investment planning.

IV. DISCUSSION of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

For MNOs, this research is dedicated for big data management in an efficient manner in the sector of data-driven apps and telecommunication sector. Comprehending the available data, which analytic tools are eligible and must be implemented, and which type of information or data should be collected are significant for any provider of service for harvesting best results from the data.

Big data is selected and applied in this work and it is important to recognize that techniques of machine learning contribute significantly to both industrial and academic sector. It has been proven by this practical work how benefits in the business and operational aspect of telecommunication industry can be obtained with the effective application of techniques of Big Data instead of traditional techniques. Models like LSTM and ARIMA were applied for the prediction of traffic and it was explained that results were quite beneficial in strategic and short plans for operator. For the performance of our practical part, CDR database selection was based on the significance of the dataset for the MNO since it is indicated by our results that that CDRs analysis has a lot of significance beyond and currently in different areas like investment plans on the basis of optimization network, fault detection, traffic prediction, network optimization, and resource allocation.

ACKNOWLEDGMENT of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

This research is developed on the basis of a master thesis. I would like to thank my parents and academic supervisors along with my lecturers and professors in conducting this research.

References of Big Data in Telecom Industry Effective Predictive Techniques on CDRs

[1]      Zeng, D., Gu, L., & Guo, S. (2015). Cost minimization for big data processing in geo-distributed data centers. In Cloud networking for big data (pp. 59-78).Springer, Cham.

[2]      Bi, S., Zhang, R., Ding, Z., & Cui, S. (2015). Wireless communications in the era of big data. IEEE communications magazine, 53(10), 190-199.

[3]      Zheng, K., Yang, Z., Zhang, K., Chatzimisios, P., Yang, K., & Xiang, W. (2016).Big data-driven optimization for mobile networks toward 5G. IEEEnetwork, 30(1), 44-51.

[4]      He, Y., Yu, F. R., Zhao, N., Yin, H., Yao, H., & Qiu, R. C. (2016). Big data analytics in mobile cellular networks. IEEE access, 4, 1985-1996..

[5]      Imran, A., Zoha, A., & Abu-Dayya, A. (2014). Challenges in 5G: how to empower SON with big data for enabling 5G. IEEE network, 28(6), 27-33.

[6]      Boccardi, F., Heath, R. W., Lozano, A., Marzetta, T. L., & Popovski, P. (2014).Five disruptive technology directions for 5G. IEEE Communications Magazine, 52(2), 74-80.

[7]      Samulevicius, S., Pedersen, T. B., & Sorensen, T. B. (2015, May). MOST: Mobile broadband network optimization using planned spatio-temporal events.In Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st (pp. 1-5).

[8]      Ramaprasath, A., Srinivasan, A., & Lung, C. H. (2015, May). Performance optimization of big data in mobile networks. In Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on (pp. 1364-1368).

[9]      Blondel, V. D., Decuyper, A., & Krings, G. (2015). A survey of results on mobile phone datasets analysis. EPJ Data Science4(1), 10.

[10]   Italia, T. (2014). Telecom Italia big data challenge. URLhttps://dandelion.eu/datamine/open-big-data/.

[11]   Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3), 645-678.

[12]   Liu, J., Chang, N., Zhang, S., & Lei, Z. (2015). Recognizing and characterizing dynamics of cellular devices in cellular data network through massive data analysis. International Journal of Communication Systems, 28(12), 1884-1897.

[13]   Soto, V., & Frías-Martínez, E. (2011, June). Automated land use identification using cell-phone records. In Proceedings of the 3rd ACM international workshop on MobiArch (pp. 17-22).

[14]   Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159-175.

[15]   Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association.

[16]   Ho, S. L., Xie, M., & Goh, T. N. (2002). A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Computers & Industrial Engineering, 42(2-4), 371-375.

[17]   Daniel, B. K. (2017). Contestable professional academic identity of those who teach research methodology. International Journal of Research & Method in Education, 1-14.

Our Top Online Essay Writers.

Discuss your homework for free! Start chat

Top Class Engineers

ONLINE

Top Class Engineers

1218 Orders Completed

Quality Assignments

ONLINE

Quality Assignments

0 Orders Completed

Coursework Assignment Help

ONLINE

Coursework Assignment Help

63 Orders Completed