Distributed databases Name Emirates Aviation University Advanced Database System 4th November 2020 Introduction A database is a collection of information stored in a computer in a cohesive way that makes it easy to access and perform operations on it. Computer databases are mostly used to store data in relation to how an organisation interacts with its customers, as well as the files relating to said interactions. [1] There are different ways of representing data in a database, but the main ones are relational and non-relational databases. Relational databases are ones where data is stored in rows and columns while non-relational databases do not use the row-column format. They use the data structure deemed most appropriate as per the data that is to represent. In another way, you can view the database type based on the architecture used. Here you have distributed databases and centralised database[2]. A centralised database is a database that is situated, stored and maintained from just one location. On the other hand, a distributed system which will be the main focus of this report is one which consists of several databases that are connected together and spread through several locations. With the above background, we will be diving deeper into distributed databases. We will drive deeper into how it is implemented, what necessitates their creation, their advantages and disadvantages and how to mitigate the disadvantages. We will also look into types of distributed systems and give some examples of them. We will also look deeper into the implementation of the different type of distributed systems. We will compare them and see why one would choose one and not the other. Table of contents Introduction 2 Table of contents 3 Distributed databases Types of distributed databases Homogeneous Database Heterogenous Database Distributed data storage Distributed Database Management System 4 4 4 5 5 6 Distributed databases As mentioned in the introduction, a distributed database is a set of databases distributed over several areas and interconnected over a computer network. A distributed database is managed by a distributed database management system(DBMS)[1]. A distributed system has the following features. - The databases within the collection have a logical interrelation with each other. They are often represented as a singular logical database. - The data is stored across multiple physical locations. Data in every location is managed by a DBMS independently of other sites. - The connection of the DBMS is made via network to connect the processors. There is no multiprocessing implemented. Types of distributed databases There are two types of distributed databases[3]. 1. Homogenous Database 2. Heterogenous Database Homogeneous Database This is a case where all the different sites implement and store the data in an identical manner. The management system, os and data structures used are all the same making it easier to manage. Heterogenous Database In this case, the different sites use different data structure and software which may lead to issues with query processing. One site might be unable to find the other sites. The different sites may also use different operating systems. Therefore, translations are needed for different sites to communicate.[2] Distributed data storage For one to implement a distributed database they have to implement distributed data storage. There are two ways in which data can be stored on multiple sites. 1. Replication 2. Fragmentation Replication Here the data in its entirety is stored in two or more sites. When the whole database is stored in all sites it is said to be redundant.