Loading...

Messages

Proposals

Stuck in your homework and missing deadline?

Get Urgent Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework Writing

100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Introduction of Big Data Processing

Category: Computer Sciences Paper Type: Report Writing Reference: IEEE Words: 6700

Big data is emerging as a new technique that is based on data information.  There are three types of data is including unstructured data, data, and structured data.  In order to handle big data, one of the most well-known processes is the MapReduce model [1]. The MapReduce model is basically a processing technique that is based on software implementation by using computers and clusters for the distributed storage and distributed processing.  In the case of distributed computed programs the MapReduce model is based on counterpart Apache Hadoop and Java. The big data processing is associated with the structured query language that is SQL for the database [2].   

Language is concerned with relational database management.  The database is many tables and collection of attributed rows.  The data processing for rows and tables is a slow process if irregular data sets are considered for the operation.  The most important factor is that the traditional SQL model becomes less valid when using different types of data sets.  The question is that how this issue can be resolved.   The appropriate way to resolve the issue is the graph databases.   According to the DB engines.com graph databases are emerging as the fastest growing category for the management of data as well as consultancies of the database [3].

The use of graphs in scientific processes, governance, and industrial processes have been increased. The implementation of graphical representation becomes valid when dealing with real world data. The reason is flexibility, intuitive and support provided by the graphs for data. The social network can be demonstrated as the best example of graphical representation [4]. The uses of the network are nodes, the specification of each person are the properties including name, age and other aspects, and lines connecting users with others indicates the relationship with other peoples [1].

The use of graphs in the modeling process has a wide range of applications such as the analysis of biological structures, social networks, protein structures, workflows, web analysis, chemical compounds an XML documents. The use of graph processing platform is becoming a diverse process [3]. In a nutshell, database users graph structures to store data, represent properties, semantic queries for edges and nodes. The graph is used to present data in an accurate way, provide information and to demonstrate the relation between whole data. The prime concern of databases is to make a connection between the pieces of information and then to represent the connection with graphical representation [1].

In case of a conventional database, the process of curious about the relationship may take some more time.  The reason is that all the relationships are illustrated with queries and foreign keys to join the data in tables.   The fact is joining the tables and database is an expensive process that requires a wide range of numbers and objects [3].  For the improved processing, multiple tables should be joined together for the indirect queries to obtain the graph from the XML database.  The working principle of a graph database is based on storage of relationship between the data [5].  In the database, all the related nodes are physically linked with each other before immediate access of data becomes possible. While comparing relational database and graph database, graph database only reads the data and the relation from the store values.

In a graph database, the satisfaction of queries is an essential part of the analysis. The graph database stores relation and objects as well as enables to define different kinds of relations and different kinds of objects in the database [1]. Similar to the other non SQL databases the graph database is mainly scheme less. While considering performance and flexibility the graph databases are similar to the document databases enter key values are stored in a relationship with each other in a tabular form. All the data stored in a database is oriented in table form [6].

The prime objective of a graph database model is to determine the relationship between nodes n data points. Therefore instead of searching the values for the data point in different types of SQL databases, the main objective of a graph database is to organize and then analyses all the data points [1]. The data points in the graph database are associated with each other in a relationship. Graph database provides additional process structuring to analyze the data and increases the effectiveness of the process [7]. The graph database provides more advantages as compared to the other databases as it stores complex and big data in the form of graph includes properties, edges, and nodes. The additional advantage of a graph database is the boosting of performance [1].

Design of graph databases is important to understand the relation between the pieces of data, information, improved performance, and growth of relation. The use of a graph database is a flexible process that can be changed at any point according to the need of organization [1]. Structure of models is according to the needs and requirements for the organizations. Besides all the advantages development of environmental process and support to the agility is an important feature of a graph data base. The graph databases are highly recommended in the business as according to the change in requirements the database can be changed [3].

The use of different platforms in the business is beneficial and reduces challenges in handling the process. The important platform for the application is the design of the existing platform and tuning of all the processes. The research article published in ifoq.com illustrated that graph processing of extremely large values have always remained as a challenge but the recent improvement in the big data technologies had changed the way. The research deals with change in technology and comparison are discussed in detail in the article [1].

Another research was published on Dummies.com about the ways of databases, algorithms, and processes defined the relation between different web-pages. The method used by the Google search is different from other search engine processes [1]. The process is a scale to measure the importance of web pages. On the other hand, some other methods were discussed in the process related to the webpage processes and the other database technologies include Neo4j-one. The technique is different from the previous one and still going through issues of scalability even the technique is not suitable for the shading process [3]. Apache graph is a process of graphs that stores information in HDFS. Due to the seal of approval on the Facebook, Giraph is emerging as a preferable technique by Hadoop but still faces limitations.  The sole process of the engine is data loading as cluster memory and the process is optimized by batch oriented queries. The graph X provides a process to generate the data, process of data, and the use of spark framework [1].

The collection of graphs are included in the Graph X for the algorithms and simplification of analytic processes. Another scalable database is Titan that stores data optimizes, and includes queries of the database. The transactional database provides supports to the number of concurrent users and the process is executable for the complex graphs and traversal process in the real time [1]. The Apache Acumulo is preferable and scalable for the distributed data associated with the Google big table. By using the Apache Accumulo the users can manage and store large data sets through the clusters of computers [1]. Accumulo uses Apache Hadoop for the data storage process and Apache Zookeeper considers the data for the consensus [3]. The world has known data storage process is Azure Cosmos DB that is based on the multimodal distributed models for the services. The process considers Microsoft mission and critical application. The use of Azure Cosmos DB Gremlin API is or the storage and operation of graphical data. The model of the system provides support to the Graph data and API for the traverse process through the data of the graph [1].                                                                  

Graph Databases of Big Data Processing

The graph database is that term which is used to describe the data into different models of the graph and in graph model labeled property is used for the determination of data. In this graph, some variation occurred as compared to other graphs with some specific properties and has some similarities in their edges and nodes pattern. The labeled property graph has the unique property of using direct pattern of the graph in which nodes and edges preserve the set of pairs of the key value, termed as the attributes. In this pair of the key set value, the key is the strings of text while value considers as the arbitrary type of data. Moreover, key value pairs are used in different projects of the computer.

The basic and main reason for properties to get proper information regarding edges and nodes. The main role of labels is that to collect the same class of nodes in a group and also specify different roles for the data set and label works in the form of tag. However, for the complete understanding of the labeled property graph model people consider into different forms as nodes consider as the noun which is classified through labels, edges and nodes are considered as adjectives and verbs. Properties are present in both nodes and edges and labels are only present in the nodes. Attributes are present in both edges and nodes but many or zero labels are present only in nodes.

The effect of labels and attributes of labeled property graph model on the different data modeling which is included in the database. Each item of data in the standard model graph has an edge or node according to the requirement. This nodes and labeling make the modeling of data easy as the forms of data are two. In the labeled property graph model, it is difficult to decide about the nodes and edge as well as the property of data modeling. During the era of 1956, the property of nodes is listed by the scientist and that year was became easier for the nodes that matched with the nodes of a scientist. The year 1956 was the remarkable year for the researchers due to many reasons as the two nodes commonly share and it was good for both nodes.

The model is not considered as the wrong and right because it represents the data. Different models are used for their specific cases and deciding the model of data is the most crucial part used in the graph database. These decisions are about the method in which model data is used. For the categorization, labels are used for the simplification of the data like the list of all the students of colleges, whereas nodes also play a significant role as a unit while drawing a graph and also denote the different entities of the data like things, places, and people.

The other important unit that is used in the graph is the edge that represents the relationship between different entities like a number of students in universities. Moreover, properties discussed the detailed information of entities and their relationship with each other and it also used for the preservation of that information of an entity which is not modeled and labeled.

Graph vs Relational databases of Big Data Processing

For the codification of the tabular structures and paper forms the relational databases are used as they offer great effectiveness. For the data, there is required pre-defined scheme, which does not allow changes due to the unexpected relationship. In regards to modeling relationship the relational database is not found much effective with the reason that to relate information with one and other it requires foreign keys. Moreover, for the simple relationship, the referencing keys works well however they also have some problem when the relationship is multi-faceted. The major reason behind this aspect is that there is a need to join the tables regarding foreign keys in case of combining the tables. However, joining tables is a complex task and at the same time, it is a time taking the task.

People

 

ID

Name

1

Nick

2

Matthew

3

Adam

4

Jaimie

5

Nolan





Connections

 

Person ID

Connection ID

1

2

1

3

2

3

2

5

3

1

3

2

4

3

4

5

5

4














In the aforementioned table, the table shows the relationship among the people mentioned where one table is based on the relationship and the other is based on the people. It can be seen that in the table every person is assigned an ID which is in numeric form. On the other hand, in the connection table, the persons’ ID and the connection IDs are presented. Moreover, it can be seen in the table that the ID given to nick is 1, Mathew received ID as 2, Adam’s ID is 3, Jaimie’s ID is 4 and Nolan’s ID is 5.

In the people table, the primary key is the ID Column. Following this, the in the connection table, the composite primary keys are used for the connection Id columns and for the Person ID column. For the composite primary key, both columns should have combination, which is based on the values. For example, it can be seen in the aforementioned table that the person ID is shown as it is equal to 1 and the connection ID is equal to 2 which is difference composite value as compared to the person ID which is equal to 1 and connection ID which is equal to 3. Moreover, in case, if the directed relationship is required therefore relational database then there is a composite value within the configuration such as in the aforementioned table it can be seen that the as compared to the Person ID 1 and Connection ID 3, the person with ID 3 and connection 1 has different composite. This is the best example that shows the way in which the modeling used for the unrequited relationships.

 In this example, the relationship is presented as undirected there it the possibility to assume that the same composite value exists for the Person with ID 1 and Connection ID 3 with the Person with ID 3 and Connection ID 1. Moreover, in the queries such as in which we want to find Nick's connection, we can do it easily when we use relationship modeling. Moreover, In the case when we have to find that which ID is linked with the nick then we take the ID and will see the connection table and then there we will find the IDs that are linked with the Nick. Following this, we will move back to the connection ID to find that which name is associated with the given connection ID. However, this is somehow complex and difficult because this process is used to be done subconsciously by the majority of the humans while they are using the spreadsheets. Moreover, to find the answer of the question that “Who is connected with Nick?” we use a very similar process in which the connections directed towards the questions and then we will see that who is Nick connected to which is basically two different questions and so they both have different results. In this situation, the problems started to take place questions are asked that required in-depth understanding such as if someone asks that "Who are the connections of Nick’s Connections?”.

In finding the answer to it, there is a need to do detailed work in which firs Nick's connection will be found and following this from each connection of nick the other connections will be found. Though it seems very complex it also possible and this is happening with the help of the relational model, which can be seen above. While moving forward to find the answer to the questions that "Who are the connections of the connections of Nick's connection? It can be seen that the with the relational model it is more computationally expensive and at the same time, it seems somehow impossible. Therefore, it can be seen that it is difficult to understand this type of questions whereas this type of scenario might happen in our life. In this regards, the best example for such scenario is the example of the LinkedIn. On the LinkedIn, it can be seen that there are first, second and third connections.

The category of first connection is used for the people who are connected directly whereas under the second connection category represents the first connection of the users. Following this, in the third connection, the user's second connection are linked. To understand it in a better way lets go back to the last example where the third connection of Nick on LinkedIn develops a query i.e. “Who are the connection of the connection of Nick’s Connections?” Furthermore, then it can be stated that the LinkedIn's system of connection is somehow not suitable with the relation model, however, on the other hand, it works well with the graph model. Contrasting to the relational database, the relationship in the first class is considered as though the graph database with the reason that they are stored explicitly.

Graph processing techniques of Big Data Processing

Neo4j

The graphical model is basically designed to provide answers through the Cypher queries. The new model resolves technical issues and business problems by organizing the data and structuring the data in the graph. The data model Neo4j ensures all the data matched with the whiteboard [8]. The model enables to resolve the visual and simple models. The ERD terms are considered by the business users to draw the business models. The nodes of each group are categorized and labels are assigned to the data. In order to construct a graph all the nodes are labelled with the same category. The database queries works for the whole graph efficiently. The queries are easier to write and to be changed [8]. All the nodes are labelled with generic nouns in the process additional to the graph. The notes of data provide relation for the connection between two nodes. There are two categories of nodes including target node and source node [8]. The direction of relation is particularly in the single direction. Neo4j works efficiently for the transversal performance without any query that is specified in particular direction. Neo4j model predict complicated dynamics for the flow of sources, network failure and an influence of groups. The model works for transaction operation and analytical processes [8]. The native platform considered the real word system and develops solutions. The power of the optimized approach is streamlined workflows. The algorithm used in Neo4j are based on analytics platform. The traversal and path finding algorithms used in the Neo4j model are associated with the deep processes [8]. The algorithms used in the Neo4j model includes parallel breadth first search (BFS), parallel depth first search (DFS), single source shortest path, all pairs shortest path, and minimum weight spanning tree (MWST). The centrality algorithms estimates the working process for the nodes and includes Pagerank, Degree Centrality, Closeness Centrality, and between-ess centrality [8].

Big data is basically dataset that is structured and unstructured and the computing system is required for traditional processing. The increase in a number of organization producer's huge datasets the size of the data sets is open in terabytes [1]. For instance, Walmart of United States produces millions of transactions in an hour and the database is 2.5 pb. The big data processing is proposed by 3v model and features of the database are veracity, accuracy, and reliability. The big data lifestyle consists of generation, acquisition, storage, and production. Age of data. The first phase of the life cycle is a generation of big data that is associated with specific sources [1].

The data generated in the process includes telescope data, healthcare, computational biology, related to transport, agriculture, and astronomy. The data acquisition process includes a collection of data transmission of data and processing of data [6]. The data is generated from raw data that was collected from different sources required for data integration [6]. The storage phase of big data processing is the management and storage of the data in different databases. There are different systems for the storage of data including Microsoft cosmos, Facebook, TFS, and GFS. While on the other hand NoSQL databases include three types of storage models such as document oriented, key value model, and column oriented data storage in a big table [1].

The big data production is the last stage of the big data life cycle that is similar to the traditional analysis and it is potentially useful for the analysis and extraction of the data in the storage. There are different types of methods used generator production particularly if the data size is massive and the processes include parallel computing techniques, stream based, BSP based and MapReduce based [3]. The BSP parallel system is preferable for the solution of computation models and it is well suited for better performance. The MapReduce is a processing of problem through big data analytics [1].

Apache Giraph of Big Data Processing

Apache graph is a processing framework that works on the basis of iterative graph process. Apache graph is developed on the top of Apache Hadoop. The input of the graph is based on the vertices and directed edges [9]. All the vertex are used to store the values and edges determines the values. The computational consideration is to determine the source and initial values for the predetermined vertex. The computation process is a sequence of operation in which each vertex is actively oriented [9]. The Apache graph method measures minimum values on all the vertex, adopts values for vertex and then generate it on the outgoing edges. In the execution of the process, the setup loads the graph from the disk of system, assigns the vertices to the workers and then validates the health of workers and process [9]. The computation state is provided by the zookeeper, the coordination state is provided by the masters and the workers state is provided by vertices. Zookeeper check the statistics, aggregate values, and checkpoints [9]. In the Apache graph, the super steps are composite of framework, and the defined function is conception for the parallel conditions. The function is defined by user and specifies the behavior of each vertex for all the super steps. The function reads messages, receive messages, and then bounce back it to the outgoing edges. The Apache graph process is efficient, fault-tolerant implementation, and scalable process that works through the clusters of computers. The hash partitioning is the default mechanism of partition and it is supported by system partition [9].

In the computation different processing contains three steps that work in order and the steps includes a concurrent composition that runs simultaneously, communication of the dataset with the exchange and messages, and the barrier synchronization process is the super step in the model.  A similar concept is used by graph processing system GPS, Apache HAMA, Apache Giraph, and signal collation mode of the synchronous model [1].   In the BSP model, the synchronization stage is a super step that follows the concept of the asynchronous model but vertex centric asynchronous model avoids the bottleneck. The appropriate type of asynchronous model includes Graph Lab and signal/collect [3].

The graph partitioning has a wide range of applications in the Telephone Network design.  The drawback of the process is edge connecting with the vertices and minimization of parts.  In the context of graph partitioning and distributed computing, the pieces of graphs are self-contained and reduce the communication.  The process is not a trivial process and there are two particular points required for the partitioning of the graph.  The partition of the graph should be in two equal parts to reduce the load on the worker.  The number of edges should be minimized on each partition.

 




                                              

Figure 1: BPS model for the data processing

Giraph X of Big Data Processing

Graphx is emerging as a new component particularly as a spark program and it is the parallel graph computing process.  While considering the high levels graphics extends to the spark rdd and directed toward the multigraph properties [10]. The model is further attached to the vertex and edges.  Graphx exposes wide range of fundamental operations including subgraph, aggregated messages,   and join vertices.  The Graphx are optimized by Pregel API. The graphx model is based on collection of graphs, analytical processes, and development of Builders [10].

Pregel program is an efficient computational model that works for the processing of vertices through trillions of the data iterations. The model approaches the BSP processing model for the updation and iteration process [2]. The pattern matching process is required to measure the difference between the data points in the database.  There are four types of pattern matching process including subgraph isomorphism, graph simulation, strong simulation, and dual stimulation.  The subgraph isomorphism demonstrates bijective mapping between the sub graph data and query data [4].

The exact algorithm used in the process is not practically applicable to the large graphs.  Simulation process allows a quick alternative for the graph isomorphism and measures the child relationship of vertices to the quadratic algorithm [2].  The graph simulation process is preferred for medium size graph it is not sufficient enough for the massive graphs [1].   The dual simulation graph considers parent relationship of the vertices with the data set, the method bounces back to the original parent of the vertex.  The strong simulation initially develops location conditions.  The outcomes of the connected parts provide perfect subgraph with respect to the parent graph.  The graph matches all the connected parts with each other and measures strong simulation [1].

There are different techniques and approaches for big data processing. In the present work, a big data framework includes five different categories of techniques for big data processing. The technique includes mathematical approaches, cloud-based big data processing techniques, graph processing techniques. The graph processing technique works according to the BSP parallel computing process that is similar to cloud computing [1]. While comparing graph processing with the batch processing there are several big data applications that works efficiently in graph processing technique.

Titan of Big Data Processing

Titan is another type of distributed graph database that provides supports for the millions of data. The processes is even sustainable for more than 1 billion transactions. A particular software is developed today with hundreds of applications at a single time [11]. The process requires only a single machine to store the data in a query of data through the notes. The elegant system is faster for the transfer of database [11].

The application scales works for the clusters and big data graph applications that can be confined through a single machine. The graph can be retrieved from the disk and represent incident edges for the fine data on the same page of disk. The Classic binary search method is used to process the data in a sequence. In the sequential process the CPU cache increases as compared to random memory access [11]. The benchmark of the results are important to demonstrate the relation for the performers in the model.

 The data set is efficient for the relatively small graphs and works for the default settings in the databases. There are two different types of storage mediums including SSD and HDD [11]. Resource constraints are larger for the single machine. The relative performance is executing in the three different types of read based queries [11]. The titan model is efficient for the single use and the thread queries for the database at a time. The graph compression technique is a compact process for the logical adjacency and co-location of the data in the disk. The footprints of Titan remains low on the disk [11].

The programming model for graph processing is based on Apache. In case of BSP parallel computing system works on the top of Hadoop. The java based system Hama is introduced for the massive scientific computations, network computation, graph function, and matrix algorithm [1]. The java based program Hama is distributed over vertices, nodes, and properties. There are three main components of Hama architecture including zookeeper, groom servers, and BSPMASTER. The BSP master maintains the process and application of the groom servers, job progress, and super steps. The groom server works efficiently for the tasks and synchronized with the barrier of work [1].

 The computation models for the graph processing are based on algorithms and works efficiently for the scalable processing of graphs.  The distributed models for the big data computing system includes MPI-like, Mapreduce, vertex-centric BSP, and vertex centric asynchronous models.  The MPI-like Models are basically messaged passing interface that provides a platform for the distributed processing of graph. The graph processing in this model is the CGM graph and parallel BGL. The model works for the large scale distributed system [1].  Google introduced MapReduce model for fault tolerance and graphing of algorithms.  The drawback of the model is that the model is not suitable for the iterative algorithms.  The vertex centric BSP model is often known as a bulk synchronous parallel model that works pro series of super steps [3].

Accumulo of Big Data Processing

With the revolution of Technology the data storage process has been changed. One of the oldest process is Neo4j and it is still suffering from the scalability issues. The system is not providing support for the mature graph databases [5]. The scalable architecture can be improved by using millions of edges and optimized databases. The graph specified databases are no SQL databases and provides effective solution to the problem. The Apache accumulo is a technology used by the Google as a big table in 2011 [5].

Apache accumulo is an example of generic databases that provides process for the storage of graph and to record the flexibility. The technical support provided by NSA addresses different types of databases and proper data models that can work effectively for the weights, vertices, and edges [5]. The implementation of Apache accumulo database based on the external system, critical factors, relative input and output, and the memory requirements. The big table design is based on multiple databases including hypertable, Apache Hbase, and Apache accumulo [5]. The Apache accumulo database performs on the basis of algorithms. The iterative process can be implemented for the main memory system and execute for the developer under certain conditions [10]. The Apache accumulo, instantly copies all the data from the nodes to the cluster. The process of read and write are inside the iterator stack. The developer inserts general iteration for the distributed accumulo execution [5].

The mission is based on design distribution algorithm and each vertex is a set of data points.  In the vertex centric system, the only known value is the label and each vertex communicates with the other data points for the status and labels the further process is evaluation (Nisar, Fard, & Miller, 2018). Each vertex evaluates the conditions for the data sets in the database.   The Boolean flag is a match flag that indicates potential vertex in the theory of graph (Nisar, Fard, & Miller, 2018).  In the initial stages, the flag is often false.  The second step is super stepped that validates the matching and no further communication is required.  The algorithm for the dual simulation is somehow identical to the graph simulation.  In an initial step, the algorithm checks the relation between the databases and extends the algorithm for the whole relationship (Nisar, Fard, & Miller, 2018).  There are two phases of strong simulation including dual simulation for the identification of matches in the set r in the second condition is output computed by the strong simulation according to the data set.  The distributed graph simulation describes the proper vertices.  The drawback of the approach is a requirement of two super steps, the approach is efficient and measures simulation for all the numbers in the data set (Nisar, Fard, & Miller, 2018).

Intersection the implementation graph is considered for two different conditions for the computing of infrastructure and to compare the pros and cons.  The graph processing system is designed with vertex centric model and BSP model.  The Pregel implementations are also considered in the GPS system for the open source assessment (Nisar, Fard, & Miller, 2018).  The GPS system deceives algorithms in Javascript and converts it into the Master.Compute() method.   The GPS system is the best example of big data graph processing it works on two components including a master node and worker nodes (Nisar, Fard, & Miller, 2018).   Each worker in a GPS system reads the distribution and partition of the life cycle.

In a summarized way the input graph files are taken by the system, the super step initiates, and finally, the computation is terminated by the vertices without transit of message (Nisar, Fard, & Miller, 2018). The AKKA toolkit works for the fault tolerant event, concurrent process, distributed function and JVM.  The GPS system works according to the Pregel model and enables to deliver the message through substantial efforts.  The messages are serialized and then de-serialized when received at the end.  The implementation program request serialized and deserialize consideration to maintain the process readable (Nisar, Fard, & Miller, 2018).

Azure cosmos DB of Big Data Processing

Azure cosmos DB is basically a traversal language that performs operations for the queries, graph entities, and operations. Azure cosmos DB provides enterprises with the new features of graph database. The process includes Global distribution for the data and the storage is independent of scaling [12]. The advantage of azure cosmos DB is single digit multi second latencies at availability of data. Azure cosmos DB supports to the gremlin databases. Azure cosmos DB can be defined as multi model database services that is globally distributed for the documents, wide columns, and key values [10]. The comprehensive service level agreements are provided to the storage at different locations and Geographic regions. Azure cosmos DB is responsive application at Global scale. The data is distributed over wide range of numbers and in a single click of button all the data can be assessed. The use of azure cosmos DB at multi-homing conditions requires no special configuration settings [10]. The read and write regions can be handled by using a single machine. The message scalable services of azure cosmos DB includes applications, tools, drivers, libraries. The system is scalable to per second granularity and can be changed for any size transparently and automatically. The dynamic data sets are available for the entire app and distributed application is at global level. The rapid iteration in the database works for the automatic process and indexes for whole data. The system serves with the blazing and fast queries and requires indexes in the database [10].

Hadoop is an emerging and exciting NOSQL Technology cutting Fluids processing of data storage of data.  The use of graph processing is increasing with unbound connections with the users.  The graph processing engine works on distributed graph database and has application in well-known social media sites such as LinkedIn, Twitter, Facebook, and Pinterest (Dummies. com, 2018).  Hadoop analysis is becoming popular for analysis of databases particularly Facebook is a prominent user of Hadoop.   This system has some limitations in the graph analysis and cannot work for the clusters of memory and notes.  The batch oriented queries are used in the processing engine (Dummies. com, 2018).

Big Data Challenges and Big Data Processing

The graph processing technique of big data is facing challenges, particularly in the data transportation and management system. The emerging generators for the data sets and databases are works to improve the transportation system [3]. The GPS system is embedded with the exponential increase of data. The data of location is required for the transportation system and numerous services in the transportation include map matching, analysis of traffic flow, visualization of data transportation and analysis of traffic flow. The geographical social network requires proper handling of the big data, TMS records and optimization, and massive records for the GPS system [3].

Conclusion on Big Data Processing

The technology is rapidly changing and new technologies are coming with the passage of time. However, people respond in a different way towards new technology. The change loving people support the new technology and adapt it whereas there are few people want to remain with their old technologies and oppose the new technologies. Both the perspectives have their own pros and cons. The new technologies enable the user to achieve a higher level of efficiency and productivity whereas the usage of old technology is also beneficial if the latest technology results in short fad.

The graph databases are one of the latest technologies that remain in this discussion ever since its inception. It is very important for any organization to ensure the security, integrity, and reliability of the data present in the database. Therefore, it changing the model of the data base from relational to the graph is a very critical decision for an organization, therefore, most of the organization resists the change. However, many small and large organizations are also increasingly adopting the graph database in order to seek advantage from the latest technology. Moreover, the organizations dealing with heavily connected that have realized that as the data will increase the rational model will increase the issues for managing data effectively.

Following this the rigid structure suits to the organization that uses the tabular data and in this case, the relational model is well suited to them. In the modern, we are going through a revolutionary period where graph and databases are on their rise. However, Graph data is not recommended for all the application but there are other options available for the governments that facilitate them to meet their required data. In case, if the progress will continue with the current pace then the relational model of data might not remain as a default data base. Instead of it, the organization would opt to use the data that suits them best.


References of Big Data Processing

[1]

A. A. Chandio, N. Tziritas and C.-Z. Xu, "Big-Data Processing Techniques and Their Challenges in Transport Domain," Big-Data Processing Techniques, vol. 15, no. 40, pp. 02-22, 2015.

[2]

Core. ac. uk, "EFFICIENT ANALYSIS OF LARGE-SCALE SOCIAL NETWORKS USING BIG-DATA PLATFORMS," 05 2014. [Online]. Available: https://core.ac.uk/download/pdf/52928914.pdf.

[3]

M. U. Nisar, A. Fard and J. A. Miller, "Techniques for Graph Analytics on Big Data," Techniques for Graph Analytics on Big Data, vol. 01, no. 01, pp. 01-10, 2018.

[4]

A. Mohan and R. G, "A Review on Large Scale Graph Processing Using Big Data Based Parallel Programming Models," I.J. Intelligent Systems and Applications, vol. 01, no. 01, pp. 49-57, 2017.

[5]

Infoq. com, "Graph Processing Using Big Data Technologies," 17 03 2014. [Online]. Available: https://www.infoq.com/news/2014/03/graph-bigdata-tapad.

[6]

Developer. ibm. com, "Processing large-scale graph data: A guide to current technology," 09 05 2013. [Online]. Available: https://developer.ibm.com/articles/os-giraph/.

[7]

Scads. de, "GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA," 2018. [Online]. Available: https://www.scads.de/images/scads_ringvorlesung/rv-graphs-rahm.pdf.

[8]

Neo4j. com, "Graph Algorithms in Neo4j: 15 Different Graph Algorithms," 23 04 2018. [Online]. Available: https://neo4j.com/blog/graph-algorithms-neo4j-15-different-graph-algorithms-and-what-they-do/.

[9]

Developer. ibm. com, "Processing large-scale graph data: A guide to current technology," 09 05 2013. [Online]. Available: https://developer.ibm.com/articles/os-giraph/.

[10]

J. E. Gonzalez, R. S. Xin, A. Dave and D. Crankshaw, "GraphX: Graph Processing in a Distributed Dataflow Framework," Dataflow Framework, vol. 02, no. 02, pp. 01-10, 2017.

[11]

Datastax. com, "Boutique Graph Data with Titan," 2018. [Online]. Available: https://www.datastax.com/dev/blog/boutique-graph-data-with-titan.

[12]

Docs. microsoft. com, "Welcome to Azure Cosmos DB," 08 04 2018. [Online]. Available: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction.

[13]

Dummies. com, "3 HADOOP CLUSTER CONFIGURATIONS," 2018. [Online]. Available: https://www.dummies.com/programming/big-data/hadoop/graph-processing-in-hadoop/.

 

Our Top Online Essay Writers.

Discuss your homework for free! Start chat

Top Class Engineers

ONLINE

Top Class Engineers

1218 Orders Completed

Quality Assignments

ONLINE

Quality Assignments

0 Orders Completed

Coursework Assignment Help

ONLINE

Coursework Assignment Help

63 Orders Completed