With the purpose of building a framework that will allow us to clearly identify the tradeoffs involved when replicating databases on wide area networks, we developed a more modular version of the replication algorithm (Maintaining Database Consistency in P2P Networks). We are investigating a new metric that will allow us to quantify the opportunity of establishing new replicas into a replicated system. We are also studying the possibility of enhancing the current replication schemes in order to increase their fault tolerance and scalability properties, in the context of dynamic networks.
In the current implementation we provide only unreliable, best effort semantics, similarly with UDP. The overlay networks configures itself automatically, and dynamically grows or shrinks as nodes decide to join or leave the network, and supports partitions, merges, crashes and recoveries, and any such cascading events. Applications that use the overlay network use a simple API consisting in four calls (that provide connect, disconnect, send and receive), very similar to UDP socket functions.
From Total Order to Database Replication |
ps,
ps.gz,
pdf.
To appear in the Proceedings of the 22nd IEEE International Conference on Distributed
Computing Systems (ICDCS), Vienna Austria, July 2002
Yair Amir, and Ciprian Tutu. This paper presents in detail an efficient and provably correct algorithm for database replication over partitionable networks. Our algorithm avoids the need for end-to-end acknowledgments for each action while supporting network partitions and merges and allowing dynamic instantiation of new replicas. One round of end-to-end acknowledgments is required only upon a membership change event such as a network partition. New actions may be introduced to the system at any point, not only while in a primary component. We show how performance can be further improved for applications that allow relaxation of consistency requirements. We provide experimental results that demonstrate the superiority of this approach.
|
Maintaining Database Consistency in Peer to Peer Networks |
ps,
ps.gz,
pdf.
Technical Report CNDS-2002-2, February 2002.
Baruch Awerbuch, and Ciprian Tutu. We present an algorithm for persistent consistent distributed commit (distributed database commit) in a dynamic, asynchronous, peer to peer network. The algorithm has constant overhead in time and space and almost constant communication complexity, allowing it to scale to very large size networks. Previous solutions required linear overhead in communication and space, making them unscalable. We introduce a modular solution based on several well defined blocks with clear formal specifications. These blocks can be implemented in a variety of ways and we give examples of possible implementations. Most of the existing solutions require acknowledgments from every participant for each action. Our algorithm is highly efficient by aggregating these acknowledgments. Also, in contrast with existing solutions, our algorithm does not require any membership knowledge. Components are detected based on local information and the information is disseminated on an overlay spanning tree. The algorithm may prove to be more suited for practical implementation than the existing ones, because of its simplicity. |
Practical Wide Area Database Replication |
ps, ps.gz,
pdf.
Technical Report CNDS-2002-1, February 2002.
Yair Amir, Claudiu Danilov, Michal Miskin-Amir, Jonathan Stanton and Ciprian Tutu. This paper explores the architecture, implementation and performance of a wide and local area database replication system. The architecture provides synchronous, peer replication, supporting diverse application semantics, based on a group communication paradigm. Network partitions and merges, computer crashes and recoveries, and message omissions are all handled. Using a generic replication engine and the Spread group communication toolkit, we provide replication services for the PostgreSQL database system. We define three different environments to be used as test-beds: a local area cluster, a wide area network that spans the U.S.A, and an emulated wide area test bed. We conduct an extensive set of experiments on these environments, varying the number of replicas and clients, the mix of updates and queries, and the network latency. Our results show that sophisticated algorithms and careful distributed systems design can make symmetric, synchronous, peer database replication a reality for both local and wide area networks. |
We plan on conducting additional wide-area experiments on a more diverse, and larger set of hosts at the Emulab facility hosted by the University of Utah.