Data replication strategies in DDBMS

Fully,partial and un replicated

Data replication in Distributed Database Management Systems (DDBMS) is the process of creating and maintaining multiple copies of data across different nodes or sites in a distributed database. There are three common data replication strategies:

1. Fully Replicated:
In fully replicated strategy, each data item is replicated on every node or site of the distributed system. This means that every node has a complete and identical copy of the entire database. The advantage of fully replicated strategy is that it provides high data availability and fault tolerance as any node can handle queries or transactions even if some nodes fail. However, the downside is that it requires a high amount of storage space and imposes a significant overhead on update operations, as all copies of the data need to be updated simultaneously.

2. Partially Replicated:
In partially replicated strategy, only a subset of the data items is replicated on multiple nodes. The selection of which data items to replicate is based on certain criteria, such as frequently accessed data, frequently updated data, or data that needs to be available locally for specific nodes. By replicating only a portion of the data, storage space requirements are reduced compared to fully replicated strategy, and update overhead is also lower as only the replicated data needs to be updated. However, this strategy suffers from data inconsistency issues, as different nodes may have different versions of the non-replicated data items.

3. Unreplicated:
In unreplicated strategy, no data items are replicated, and each data item exists only once in the entire distributed system. This strategy saves storage space and minimizes update overhead as only a single copy of each data item is maintained. However, it also poses a risk of a single point of failure. If the node containing the data item fails, the data becomes temporarily or permanently inaccessible until the node is repaired or brought back online.

Overall, the choice of data replication strategy depends on the specific requirements of the distributed system, including data availability, fault tolerance, storage space, and update performance. Different combinations of these strategies can be used in a distributed system to achieve a balance between these factors.