Big data is emerging as a new technique that is based on data
information. There are three types of
data is including unstructured data, data, and structured data. In order to handle big data, one of the most
well-known processes is the MapReduce model [1].
The MapReduce model is basically a processing technique that is based on
software implementation by using computers and clusters for the distributed
storage and distributed processing. In
the case of distributed computed programs the MapReduce model is based on
counterpart Apache Hadoop and Java. The big data processing is associated with
the structured query language that is SQL for the database [2].
Language is concerned with relational database management. The database is many tables and collection of
attributed rows. The data processing for
rows and tables is a slow process if irregular data sets are considered for the
operation. The most important factor is
that the traditional SQL model becomes less valid when using different types of
data sets. The question is that how this
issue can be resolved. The appropriate
way to resolve the issue is the graph databases. According to the DB engines.com graph
databases are emerging as the fastest growing category for the management of
data as well as consultancies of the database [3].
The
use of graphs in scientific processes, governance, and industrial processes
have been increased. The implementation of graphical representation becomes
valid when dealing with real world data. The reason is flexibility, intuitive
and support provided by the graphs for data. The social network can be
demonstrated as the best example of graphical representation [4].
The uses of the network are nodes, the specification of each person are the
properties including name, age and other aspects, and lines connecting users
with others indicates the relationship with other peoples [1].
The
use of graphs in the modeling process has a wide range of applications such as
the analysis of biological structures, social networks, protein structures,
workflows, web analysis, chemical compounds an XML documents. The use of graph
processing platform is becoming a diverse process [3].
In a nutshell, database users graph structures to store data, represent
properties, semantic queries for edges and nodes. The graph is used to present
data in an accurate way, provide information and to demonstrate the relation
between whole data. The prime concern of databases is to make a connection
between the pieces of information and then to represent the connection with
graphical representation [1].
In case of a conventional database, the process of curious about the
relationship may take some more time.
The reason is that all the relationships are illustrated with queries
and foreign keys to join the data in tables.
The fact is joining the tables and database is an expensive process that
requires a wide range of numbers and objects [3]. For the improved processing, multiple tables
should be joined together for the indirect queries to obtain the graph from the
XML database. The working principle of a
graph database is based on storage of relationship between the data [5]. In the database, all the related nodes are
physically linked with each other before immediate access of data becomes
possible. While
comparing relational database and graph database, graph database only reads the
data and the relation from the store values.
In a
graph database, the satisfaction of queries is an essential part of the
analysis. The graph database stores relation and objects as well as enables to
define different kinds of relations and different kinds of objects in the
database [1].
Similar to the other non SQL databases the graph database is mainly scheme less.
While considering performance and flexibility the graph databases are similar
to the document databases enter key values are stored in a relationship with
each other in a tabular form. All the data stored in a database is oriented in
table form [6].
The
prime objective of a graph database model is to determine the relationship
between nodes n data points. Therefore instead of searching the values for the
data point in different types of SQL databases, the main objective of a graph
database is to organize and then analyses all the data points [1]. The data points in
the graph database are associated with each other in a relationship. Graph
database provides additional process structuring to analyze the data and
increases the effectiveness of the process [7].
The graph database provides more advantages as compared to the other databases
as it stores complex and big data in the form of graph includes properties, edges,
and nodes. The additional advantage of a graph database is the boosting of
performance [1].
Design
of graph databases is important to understand the relation between the pieces
of data, information, improved performance, and growth of relation. The use of
a graph database is a flexible process that can be changed at any point
according to the need of organization [1].
Structure of models is according to the needs and requirements for the
organizations. Besides all the advantages development of environmental process
and support to the agility is an important feature of a graph data base. The
graph databases are highly recommended in the business as according to the
change in requirements the database can be changed [3].
The
use of different platforms in the business is beneficial and reduces challenges
in handling the process. The important platform for the application is the
design of the existing platform and tuning of all the processes. The research
article published in ifoq.com illustrated that graph processing of extremely
large values have always remained as a challenge but the recent improvement in
the big data technologies had changed the way. The research deals with change
in technology and comparison are discussed in detail in the article [1].
Another
research was published on Dummies.com about the ways of databases, algorithms,
and processes defined the relation between different web-pages. The method used
by the Google search is different from other search engine processes [1]. The process is a
scale to measure the importance of web pages. On the other hand, some other
methods were discussed in the process related to the webpage processes and the
other database technologies include Neo4j-one. The technique is different from
the previous one and still going through issues of scalability even the
technique is not suitable for the shading process [3].
Apache graph is a process of graphs that stores information in HDFS. Due to the
seal of approval on the Facebook, Giraph is emerging as a preferable technique
by Hadoop but still faces limitations.
The sole process of the engine is data loading as cluster memory and the
process is optimized by batch oriented queries. The graph X provides a process
to generate the data, process of data, and the use of spark framework [1].
The collection
of graphs are included in the Graph X for the algorithms and simplification of
analytic processes. Another scalable database is Titan that stores data
optimizes, and includes queries of the database. The transactional database
provides supports to the number of concurrent users and the process is
executable for the complex graphs and traversal process in the real time [1]. The Apache Acumulo
is preferable and scalable for the distributed data associated with the Google
big table. By using the Apache Accumulo the users can manage and store large
data sets through the clusters of computers [1].
Accumulo uses Apache Hadoop for the data storage process and Apache Zookeeper
considers the data for the consensus [3].
The world has known data storage process is Azure Cosmos DB that is based on
the multimodal distributed models for the services. The process considers
Microsoft mission and critical application. The use of Azure Cosmos DB Gremlin API is or
the storage and operation of graphical data. The model of the system provides
support to the Graph data and API for the traverse process through the data of
the graph [1].
Graph Databases of Big Data Processing
The graph
database is that term which is used to describe the data into different models
of the graph and in graph model labeled property is used for the determination
of data. In this graph, some variation occurred as compared to other graphs
with some specific properties and has some similarities in their edges and
nodes pattern. The labeled property graph has the unique property of using
direct pattern of the graph in which nodes and edges preserve the set of pairs
of the key value, termed as the attributes. In this pair of the key set value,
the key is the strings of text while value considers as the arbitrary type of
data. Moreover, key value pairs are used in different projects of the computer.
The basic and main reason for properties to get proper information
regarding edges and nodes. The main role of labels is that to collect the same
class of nodes in a group and also specify different roles for the data set and
label works in the form of tag. However, for the complete understanding of the
labeled property graph model people consider into different forms as nodes
consider as the noun which is classified through labels, edges and nodes are
considered as adjectives and verbs. Properties are present in both nodes and
edges and labels are only present in the nodes. Attributes are present in both
edges and nodes but many or zero labels are present only in nodes.
The effect of labels and attributes of labeled property graph model on
the different data modeling which is included in the database. Each item of data
in the standard model graph has an edge or node according to the requirement. This
nodes and labeling make the modeling of data easy as the forms of data are two.
In the labeled property graph model, it is difficult to decide about the nodes
and edge as well as the property of data modeling. During the era of 1956, the
property of nodes is listed by the scientist and that year was became easier
for the nodes that matched with the nodes of a scientist. The year 1956 was the
remarkable year for the researchers due to many reasons as the two nodes
commonly share and it was good for both nodes.
The model is
not considered as the wrong and right because it represents the data. Different
models are used for their specific cases and deciding the model of data is the
most crucial part used in the graph database. These decisions are about the
method in which model data is used. For the categorization, labels are used for
the simplification of the data like the list of all the students of colleges,
whereas nodes also play a significant role as a unit while drawing a graph and
also denote the different entities of the data like things, places, and people.
The other important unit that is
used in the graph is the edge that represents the relationship between
different entities like a number of students in universities. Moreover,
properties discussed the detailed information of entities and their
relationship with each other and it also used for the preservation of that
information of an entity which is not modeled and labeled.
Graph vs Relational databases of Big Data
Processing
For the codification of the tabular
structures and paper forms the relational databases are used as they offer
great effectiveness. For the data, there is required pre-defined scheme, which
does not allow changes due to the unexpected relationship. In regards to
modeling relationship the relational database is not found much effective with
the reason that to relate information with one and other it requires foreign
keys. Moreover, for the simple relationship, the referencing keys works well however
they also have some problem when the relationship is multi-faceted. The major
reason behind this aspect is that there is a need to join the tables regarding
foreign keys in case of combining the tables. However, joining tables is a
complex task and at the same time, it is a time taking the task.
People
|
|
ID
|
Name
|
1
|
Nick
|
2
|
Matthew
|
3
|
Adam
|
4
|
Jaimie
|
5
|
Nolan
|
|
|
|
Connections
|
|
Person ID
|
Connection
ID
|
1
|
2
|
1
|
3
|
2
|
3
|
2
|
5
|
3
|
1
|
3
|
2
|
4
|
3
|
4
|
5
|
5
|
4
|
In the
aforementioned table, the table shows the relationship among the people
mentioned where one table is based on the relationship and the other is based
on the people. It can be seen that in the table every person is assigned an ID
which is in numeric form. On the other hand, in the connection table, the
persons’ ID and the connection IDs are presented. Moreover, it can be seen in
the table that the ID given to nick is 1, Mathew received ID as 2, Adam’s ID is
3, Jaimie’s ID is 4 and Nolan’s ID is 5.
In the people
table, the primary key is the ID Column. Following this, the in the connection
table, the composite primary keys are used for the connection Id columns and
for the Person ID column. For the composite primary key, both columns should
have combination, which is based on the values. For example, it can be seen in
the aforementioned table that the person ID is shown as it is equal to 1 and
the connection ID is equal to 2 which is difference composite value as compared
to the person ID which is equal to 1 and connection ID which is equal to 3. Moreover,
in case, if the directed relationship is required therefore relational database
then there is a composite value within the configuration such as in the
aforementioned table it can be seen that the as compared to the Person ID 1 and
Connection ID 3, the person with ID 3 and connection 1 has different composite.
This is the best example that shows the way in which the modeling used for the
unrequited relationships.
In this example, the relationship is presented
as undirected there it the possibility to assume that the same composite value
exists for the Person with ID 1 and Connection ID 3 with the Person with ID 3
and Connection ID 1. Moreover, in the queries such as in which we want to find
Nick's connection, we can do it easily when we use relationship modeling.
Moreover, In the case when we have to find that which ID is linked with the nick
then we take the ID and will see the connection table and then there we will
find the IDs that are linked with the Nick. Following this, we will move back
to the connection ID to find that which name is associated with the given
connection ID. However, this is somehow complex and difficult because this
process is used to be done subconsciously by the majority of the humans while
they are using the spreadsheets. Moreover, to find the answer of the question
that “Who is connected with Nick?” we use a very similar process in which the
connections directed towards the questions and then we will see that who is
Nick connected to which is basically two different questions and so they both
have different results. In this situation, the problems started to take place
questions are asked that required in-depth understanding such as if someone
asks that "Who are the connections
of Nick’s Connections?”.
In finding the
answer to it, there is a need to do detailed work in which firs Nick's
connection will be found and following this from each connection of nick the
other connections will be found. Though it seems very complex it also possible
and this is happening with the help of the relational model, which can be seen
above. While moving forward to find the answer to the questions that "Who
are the connections of the connections of Nick's connection? It can be seen
that the with the relational model it is more computationally expensive and at
the same time, it seems somehow impossible. Therefore, it can be seen that it
is difficult to understand this type of questions whereas this type of scenario
might happen in our life. In this regards, the best example for such scenario
is the example of the LinkedIn. On the LinkedIn, it can be seen that there are
first, second and third connections.
The category of
first connection is used for the people who are connected directly whereas
under the second connection category represents the first connection of the
users. Following this, in the third connection, the user's second connection
are linked. To understand it in a better way lets go back to the last example
where the third connection of Nick on LinkedIn develops a query i.e. “Who are the connection of the connection of
Nick’s Connections?” Furthermore, then it can be stated that the LinkedIn's
system of connection is somehow not suitable with the relation model, however,
on the other hand, it works well with the graph model. Contrasting to the
relational database, the relationship in the first class is considered as though
the graph database with the reason that they are stored explicitly.
Graph processing techniques of Big Data
Processing
Neo4j
The graphical model is basically
designed to provide answers through the Cypher queries. The new model resolves
technical issues and business problems by organizing the data and structuring
the data in the graph. The data model Neo4j ensures all the data matched with
the whiteboard [8].
The model enables to resolve the visual and simple models. The ERD terms are
considered by the business users to draw the business models. The nodes of each
group are categorized and labels are assigned to the data. In order to
construct a graph all the nodes are labelled with the same category. The
database queries works for the whole graph efficiently. The queries are easier
to write and to be changed [8].
All the nodes are labelled with generic nouns in the process additional to the
graph. The notes of data provide relation for the connection between two nodes.
There are two categories of nodes including target node and source node [8]. The direction of
relation is particularly in the single direction. Neo4j works efficiently for
the transversal performance without any query that is specified in particular
direction. Neo4j model predict complicated dynamics for the flow of sources,
network failure and an influence of groups. The model works for transaction
operation and analytical processes [8].
The native platform considered the real word system and develops solutions. The
power of the optimized approach is streamlined workflows. The algorithm used in
Neo4j are based on analytics platform. The traversal and path finding
algorithms used in the Neo4j model are associated with the deep processes [8]. The algorithms used
in the Neo4j model includes parallel breadth first search (BFS), parallel depth
first search (DFS), single source shortest path, all pairs shortest path, and
minimum weight spanning tree (MWST). The centrality algorithms estimates the
working process for the nodes and includes Pagerank, Degree Centrality,
Closeness Centrality, and between-ess centrality [8].
Big
data is basically dataset that is structured and unstructured and the computing
system is required for traditional processing. The increase in a number of organization
producer's huge datasets the size of the data sets is open in terabytes [1]. For instance,
Walmart of United States produces millions of transactions in an hour and the
database is 2.5 pb. The big data processing is proposed by 3v model and
features of the database are veracity, accuracy, and reliability. The big data
lifestyle consists of generation, acquisition, storage, and production. Age of
data. The first phase of the life cycle is a generation of big data that is
associated with specific sources [1].
The
data generated in the process includes telescope data, healthcare,
computational biology, related to transport, agriculture, and astronomy. The
data acquisition process includes a collection of data transmission of data and
processing of data [6].
The data is generated from raw data that was collected from different sources
required for data integration [6].
The storage phase of big data processing is the management and storage of the
data in different databases. There are different systems for the storage of
data including Microsoft cosmos, Facebook, TFS, and GFS. While on the other
hand NoSQL databases include three types of storage models such as document
oriented, key value model, and column oriented data storage in a big table [1].
The
big data production is the last stage of the big data life cycle that is
similar to the traditional analysis and it is potentially useful for the
analysis and extraction of the data in the storage. There are different types of
methods used generator production particularly if the data size is massive and
the processes include parallel computing techniques, stream based, BSP based
and MapReduce based [3].
The BSP parallel system is preferable for the solution of computation models
and it is well suited for better performance. The MapReduce is a processing of
problem through big data analytics [1].
Apache Giraph of Big Data Processing
Apache graph is a processing
framework that works on the basis of iterative graph process. Apache graph is
developed on the top of Apache Hadoop. The input of the graph is based on the
vertices and directed edges [9].
All the vertex are used to store the values and edges determines the values.
The computational consideration is to determine the source and initial values
for the predetermined vertex. The computation process is a sequence of
operation in which each vertex is actively oriented [9].
The Apache graph method measures minimum values on all the vertex, adopts
values for vertex and then generate it on the outgoing edges. In the execution
of the process, the setup loads the graph from the disk of system, assigns the
vertices to the workers and then validates the health of workers and process [9]. The computation
state is provided by the zookeeper, the coordination state is provided by the
masters and the workers state is provided by vertices. Zookeeper check the
statistics, aggregate values, and checkpoints [9].
In the Apache graph, the super steps are composite of framework, and the
defined function is conception for the parallel conditions. The function is
defined by user and specifies the behavior of each vertex for all the super
steps. The function reads messages, receive messages, and then bounce back it
to the outgoing edges. The Apache graph process is efficient, fault-tolerant
implementation, and scalable process that works through the clusters of
computers. The hash partitioning is the default mechanism of partition and it
is supported by system partition [9].
In
the computation different processing contains three steps that work in order
and the steps includes a concurrent composition that runs simultaneously,
communication of the dataset with the exchange and messages, and the barrier
synchronization process is the super step in the model. A similar concept is used by graph processing
system GPS, Apache HAMA, Apache Giraph, and signal collation mode of the
synchronous model [1]. In the BSP model, the synchronization stage
is a super step that follows the concept of the asynchronous model but vertex
centric asynchronous model avoids the bottleneck. The appropriate type of
asynchronous model includes Graph Lab and signal/collect [3].
The graph partitioning has a wide range of applications in the Telephone
Network design. The drawback of the
process is edge connecting with the vertices and minimization of parts. In the context of graph partitioning and
distributed computing, the pieces of graphs are self-contained and reduce the
communication. The process is not a
trivial process and there are two particular points required for the
partitioning of the graph. The partition
of the graph should be in two equal parts to reduce the load on the
worker. The number of edges should be
minimized on each partition.
Figure 1: BPS model for the data processing
Giraph X of Big Data Processing
Graphx
is emerging as a new component particularly as a spark program and it is the
parallel graph computing process. While
considering the high levels graphics extends to the spark rdd and directed
toward the multigraph properties [10].
The model is further attached to the vertex and edges. Graphx exposes wide range of fundamental
operations including subgraph, aggregated messages, and join vertices. The Graphx are optimized by Pregel API. The
graphx model is based on collection of graphs, analytical processes, and
development of Builders [10].
Pregel program is an
efficient computational model that works for the processing of vertices through
trillions of the data iterations. The model approaches the BSP processing model
for the updation and iteration process [2].
The pattern matching process is required to measure the difference between the
data points in the database. There are
four types of pattern matching process including subgraph isomorphism, graph
simulation, strong simulation, and dual stimulation. The subgraph isomorphism demonstrates
bijective mapping between the sub graph data and query data [4].
The exact algorithm used in
the process is not practically applicable to the large graphs. Simulation process allows a quick alternative
for the graph isomorphism and measures the child relationship of vertices to
the quadratic algorithm [2]. The graph simulation process is preferred for
medium size graph it is not sufficient enough for the massive graphs [1]. The dual simulation graph considers parent
relationship of the vertices with the data set, the method bounces back to the
original parent of the vertex. The
strong simulation initially develops location conditions. The outcomes of the connected parts provide
perfect subgraph with respect to the parent graph. The graph matches all the connected parts
with each other and measures strong simulation [1].
There are different
techniques and approaches for big data processing. In the present work, a big
data framework includes five different categories of techniques for big data
processing. The technique includes mathematical approaches, cloud-based big
data processing techniques, graph processing techniques. The graph processing
technique works according to the BSP parallel computing process that is similar
to cloud computing [1].
While comparing graph processing with the batch processing there are several
big data applications that works efficiently in graph processing technique.
Titan of Big Data Processing
Titan is another type of distributed graph database
that provides supports for the millions of data. The processes is even
sustainable for more than 1 billion transactions. A particular software is
developed today with hundreds of applications at a single time [11]. The process
requires only a single machine to store the data in a query of data through the
notes. The elegant system is faster for the transfer of database [11].
The application scales works for the clusters and
big data graph applications that can be confined through a single machine. The
graph can be retrieved from the disk and represent incident edges for the fine
data on the same page of disk. The Classic binary search method is used to
process the data in a sequence. In the sequential process the CPU cache
increases as compared to random memory access [11].
The benchmark of the results are important to demonstrate the relation for the
performers in the model.
The data set
is efficient for the relatively small graphs and works for the default settings
in the databases. There are two different types of storage mediums including
SSD and HDD [11].
Resource constraints are larger for the single machine. The relative
performance is executing in the three different types of read based queries [11]. The titan model is
efficient for the single use and the thread queries for the database at a time.
The graph compression technique is a compact process for the logical adjacency
and co-location of the data in the disk. The footprints of Titan remains low on
the disk [11].
The programming model for
graph processing is based on Apache. In case of BSP parallel computing system
works on the top of Hadoop. The java based system Hama is introduced for the
massive scientific computations, network computation, graph function, and
matrix algorithm [1].
The java based program Hama is distributed over vertices, nodes, and
properties. There are three main components of Hama architecture including
zookeeper, groom servers, and BSPMASTER. The BSP master maintains the process
and application of the groom servers, job progress, and super steps. The groom
server works efficiently for the tasks and synchronized with the barrier of
work [1].
The computation models for the graph
processing are based on algorithms and works efficiently for the scalable
processing of graphs. The distributed
models for the big data computing system includes MPI-like, Mapreduce,
vertex-centric BSP, and vertex centric asynchronous models. The MPI-like Models are basically messaged
passing interface that provides a platform for the distributed processing of
graph. The graph processing in this model is the CGM graph and parallel BGL.
The model works for the large scale distributed system [1]. Google introduced MapReduce model for fault
tolerance and graphing of algorithms.
The drawback of the model is that the model is not suitable for the
iterative algorithms. The vertex centric
BSP model is often known as a bulk synchronous parallel model that works pro
series of super steps [3].
Accumulo of Big Data Processing
With the revolution of Technology the data storage
process has been changed. One of the oldest process is Neo4j and it is still
suffering from the scalability issues. The system is not providing support for
the mature graph databases [5].
The scalable architecture can be improved by using millions of edges and
optimized databases. The graph specified databases are no SQL databases and
provides effective solution to the problem. The Apache accumulo is a technology
used by the Google as a big table in 2011 [5].
Apache accumulo is an example of generic databases that
provides process for the storage of graph and to record the flexibility. The
technical support provided by NSA addresses different types of databases and
proper data models that can work effectively for the weights, vertices, and
edges [5].
The implementation of Apache accumulo database based on the external system,
critical factors, relative input and output, and the memory requirements. The
big table design is based on multiple databases including hypertable, Apache
Hbase, and Apache accumulo [5].
The Apache accumulo database performs on the basis of algorithms. The iterative
process can be implemented for the main memory system and execute for the
developer under certain conditions [10].
The Apache accumulo, instantly copies all the data from the nodes to the
cluster. The process of read and write are inside the iterator stack. The developer
inserts general iteration for the distributed accumulo execution [5].
The mission is based on design distribution algorithm and each vertex is
a set of data points. In the vertex
centric system, the only known value is the label and each vertex communicates
with the other data points for the status and labels the further process is
evaluation (Nisar, Fard, & Miller, 2018). Each vertex evaluates the
conditions for the data sets in the database.
The Boolean flag is a match flag that indicates potential vertex in the
theory of graph (Nisar, Fard, & Miller, 2018). In the initial stages, the flag is often
false. The second step is super stepped
that validates the matching and no further communication is required. The algorithm for the dual simulation is
somehow identical to the graph simulation.
In an initial step, the algorithm checks the relation between the
databases and extends the algorithm for the whole relationship (Nisar, Fard,
& Miller, 2018). There are two
phases of strong simulation including dual simulation for the identification of
matches in the set r in the second condition is output computed by the strong
simulation according to the data set.
The distributed graph simulation describes the proper vertices. The drawback of the approach is a requirement
of two super steps, the approach is efficient and measures simulation for all
the numbers in the data set (Nisar, Fard, & Miller, 2018).
Intersection the implementation graph is considered for two different
conditions for the computing of infrastructure and to compare the pros and
cons. The graph processing system is
designed with vertex centric model and BSP model. The Pregel implementations are also
considered in the GPS system for the open source assessment (Nisar, Fard, &
Miller, 2018). The GPS system deceives
algorithms in Javascript and converts it into the Master.Compute() method. The GPS system is the best example of big
data graph processing it works on two components including a master node and
worker nodes (Nisar, Fard, & Miller, 2018). Each worker in a GPS system reads the
distribution and partition of the life cycle.
In a summarized way the input graph files are taken by the system, the
super step initiates, and finally, the computation is terminated by the
vertices without transit of message (Nisar, Fard, & Miller, 2018). The AKKA
toolkit works for the fault tolerant event, concurrent process, distributed
function and JVM. The GPS system works
according to the Pregel model and enables to deliver the message through
substantial efforts. The messages are
serialized and then de-serialized when received at the end. The implementation program request serialized
and deserialize consideration to maintain the process readable (Nisar, Fard,
& Miller, 2018).
Azure cosmos DB of Big Data Processing
Azure cosmos DB is basically a traversal
language that performs operations for the queries, graph entities, and
operations. Azure cosmos DB provides enterprises with the new features of graph
database. The process includes Global distribution for the data and the storage
is independent of scaling [12].
The advantage of azure cosmos DB is single digit multi second latencies at
availability of data. Azure cosmos DB supports to the gremlin databases. Azure
cosmos DB can be defined as multi model database services that is globally
distributed for the documents, wide columns, and key values [10].
The comprehensive service level agreements are provided to the storage at
different locations and Geographic regions. Azure cosmos DB is responsive
application at Global scale. The data is distributed over wide range of numbers
and in a single click of button all the data can be assessed. The use of azure
cosmos DB at multi-homing conditions requires no special configuration settings
[10].
The read and write regions can be handled by using a single machine. The
message scalable services of azure cosmos DB includes applications, tools,
drivers, libraries. The system is scalable to per second granularity and can be
changed for any size transparently and automatically. The dynamic data sets are
available for the entire app and distributed application is at global level.
The rapid iteration in the database works for the automatic process and indexes
for whole data. The system serves with the blazing and fast queries and
requires indexes in the database [10].
Hadoop is an emerging and exciting NOSQL Technology cutting Fluids
processing of data storage of data. The
use of graph processing is increasing with unbound connections with the
users. The graph processing engine works
on distributed graph database and has application in well-known social media
sites such as LinkedIn, Twitter, Facebook, and Pinterest (Dummies. com, 2018). Hadoop analysis is becoming popular for analysis
of databases particularly Facebook is a prominent user of Hadoop. This system has some limitations in the
graph analysis and cannot work for the clusters of memory and notes. The batch oriented queries are used in the
processing engine (Dummies. com, 2018).
Big Data Challenges and Big Data Processing
The graph processing technique of big data is facing challenges,
particularly in the data transportation and management system. The emerging
generators for the data sets and databases are works to improve the
transportation system [3].
The GPS system is embedded with the exponential increase of data. The data of
location is required for the transportation system and numerous services in the
transportation include map matching, analysis of traffic flow, visualization of
data transportation and analysis of traffic flow. The geographical social
network requires proper handling of the big data, TMS records and optimization,
and massive records for the GPS system [3].
Conclusion on Big Data Processing
The technology
is rapidly changing and new technologies are coming with the passage of time.
However, people respond in a different way towards new technology. The change
loving people support the new technology and adapt it whereas there are few
people want to remain with their old technologies and oppose the new
technologies. Both the perspectives have their own pros and cons. The new
technologies enable the user to achieve a higher level of efficiency and
productivity whereas the usage of old technology is also beneficial if the
latest technology results in short fad.
The graph
databases are one of the latest technologies that remain in this discussion
ever since its inception. It is very important for any organization to ensure
the security, integrity, and reliability of the data present in the database.
Therefore, it changing the model of the data base from relational to the graph
is a very critical decision for an organization, therefore, most of the
organization resists the change. However, many small and large organizations
are also increasingly adopting the graph database in order to seek advantage
from the latest technology. Moreover, the organizations dealing with heavily
connected that have realized that as the data will increase the rational model
will increase the issues for managing data effectively.
Following this
the rigid structure suits to the organization that uses the tabular data and in
this case, the relational model is well suited to them. In the modern, we are
going through a revolutionary period where graph and databases are on their
rise. However, Graph data is not recommended for all the application but there
are other options available for the governments that facilitate them to meet
their required data. In case, if the progress will continue with the current
pace then the relational model of data might not remain as a default data base.
Instead of it, the organization would opt to use the data that suits them best.
References of Big Data Processing
[1]
|
A. A. Chandio, N.
Tziritas and C.-Z. Xu, "Big-Data Processing Techniques and Their
Challenges in Transport Domain," Big-Data Processing Techniques, vol.
15, no. 40, pp. 02-22, 2015.
|
[2]
|
Core. ac. uk,
"EFFICIENT ANALYSIS OF LARGE-SCALE SOCIAL NETWORKS USING BIG-DATA
PLATFORMS," 05 2014. [Online]. Available:
https://core.ac.uk/download/pdf/52928914.pdf.
|
[3]
|
M. U. Nisar, A. Fard
and J. A. Miller, "Techniques for Graph Analytics on Big Data," Techniques
for Graph Analytics on Big Data, vol. 01, no. 01, pp. 01-10, 2018.
|
[4]
|
A. Mohan and R. G,
"A Review on Large Scale Graph Processing Using Big Data Based Parallel
Programming Models," I.J. Intelligent Systems and Applications, vol.
01, no. 01, pp. 49-57, 2017.
|
[5]
|
Infoq. com,
"Graph Processing Using Big Data Technologies," 17 03 2014.
[Online]. Available: https://www.infoq.com/news/2014/03/graph-bigdata-tapad.
|
[6]
|
Developer. ibm. com,
"Processing large-scale graph data: A guide to current technology,"
09 05 2013. [Online]. Available: https://developer.ibm.com/articles/os-giraph/.
|
[7]
|
Scads. de,
"GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA," 2018.
[Online]. Available:
https://www.scads.de/images/scads_ringvorlesung/rv-graphs-rahm.pdf.
|
[8]
|
Neo4j. com,
"Graph Algorithms in Neo4j: 15 Different Graph Algorithms," 23 04
2018. [Online]. Available:
https://neo4j.com/blog/graph-algorithms-neo4j-15-different-graph-algorithms-and-what-they-do/.
|
[9]
|
Developer. ibm. com,
"Processing large-scale graph data: A guide to current technology,"
09 05 2013. [Online]. Available:
https://developer.ibm.com/articles/os-giraph/.
|
[10]
|
J. E. Gonzalez, R. S.
Xin, A. Dave and D. Crankshaw, "GraphX: Graph Processing in a
Distributed Dataflow Framework," Dataflow Framework, vol. 02, no.
02, pp. 01-10, 2017.
|
[11]
|
Datastax. com,
"Boutique Graph Data with Titan," 2018. [Online]. Available:
https://www.datastax.com/dev/blog/boutique-graph-data-with-titan.
|
[12]
|
Docs. microsoft. com,
"Welcome to Azure Cosmos DB," 08 04 2018. [Online]. Available:
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction.
|
[13]
|
Dummies. com, "3
HADOOP CLUSTER CONFIGURATIONS," 2018. [Online]. Available:
https://www.dummies.com/programming/big-data/hadoop/graph-processing-in-hadoop/.
|