Quarterly Technical Report, April 2002

Progress:

This quarter we focused on developing new more advanced protocols to provide replication and dynamic network routing. We also enhanced the existing software systems by increasing their modularity and adding new capabilities. The new overlay network implementation provides a fully application independant platform upon which routing and reliability protocols can be developed.

Papers:

From Total Order to Database Replication
ps, ps.gz, pdf. To appear in the Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna Austria, July 2002

Yair Amir, and Ciprian Tutu.

This paper presents in detail an efficient and provably correct algorithm for database replication over partitionable networks. Our algorithm avoids the need for end-to-end acknowledgments for each action while supporting network partitions and merges and allowing dynamic instantiation of new replicas. One round of end-to-end acknowledgments is required only upon a membership change event such as a network partition. New actions may be introduced to the system at any point, not only while in a primary component. We show how performance can be further improved for applications that allow relaxation of consistency requirements. We provide experimental results that demonstrate the superiority of this approach.

Maintaining Database Consistency in Peer to Peer Networks
ps, ps.gz, pdf. Technical Report CNDS-2002-2, February 2002.

Baruch Awerbuch, and Ciprian Tutu.

We present an algorithm for persistent consistent distributed commit (distributed database commit) in a dynamic, asynchronous, peer to peer network. The algorithm has constant overhead in time and space and almost constant communication complexity, allowing it to scale to very large size networks. Previous solutions required linear overhead in communication and space, making them unscalable. We introduce a modular solution based on several well defined blocks with clear formal specifications. These blocks can be implemented in a variety of ways and we give examples of possible implementations. Most of the existing solutions require acknowledgments from every participant for each action. Our algorithm is highly efficient by aggregating these acknowledgments. Also, in contrast with existing solutions, our algorithm does not require any membership knowledge. Components are detected based on local information and the information is disseminated on an overlay spanning tree. The algorithm may prove to be more suited for practical implementation than the existing ones, because of its simplicity.

Practical Wide Area Database Replication
ps, ps.gz, pdf. Technical Report CNDS-2002-1, February 2002.

Yair Amir, Claudiu Danilov, Michal Miskin-Amir, Jonathan Stanton and Ciprian Tutu.

This paper explores the architecture, implementation and performance of a wide and local area database replication system. The architecture provides synchronous, peer replication, supporting diverse application semantics, based on a group communication paradigm. Network partitions and merges, computer crashes and recoveries, and message omissions are all handled. Using a generic replication engine and the Spread group communication toolkit, we provide replication services for the PostgreSQL database system. We define three different environments to be used as test-beds: a local area cluster, a wide area network that spans the U.S.A, and an emulated wide area test bed. We conduct an extensive set of experiments on these environments, varying the number of replicas and clients, the mix of updates and queries, and the network latency. Our results show that sophisticated algorithms and careful distributed systems design can make symmetric, synchronous, peer database replication a reality for both local and wide area networks.

Plans for Next Quarter: