Research ActivityPresentations
Reports
|
Maintaining the availability of critical servers and routers is an important concern for many organizations. At the lowest level, IP addresses represent the global namespace by which services are accessible on the Internet.
We introduce Wackamole, a completely distributed software solution based on a provably correct algorithm that negotiates the assignment of IP addresses among the currently available servers upon detection of faults. This reallocation ensures that at any given time any public IP address of the server cluster is covered exactly once, as long as at least one physical server survives the network fault. The same technique is extended to support highly available routers.
The paper presents the design considerations, algorithm specification and correctness proof, discusses the practical usage for server clusters and for routers, and evaluates the performance of the system.
Reliable point-to-point communication is usually achieved in overlay
networks by applying TCP/IP on the end nodes of a connection.
This paper presents an hop-by-hop reliability approach that
considerably reduces the latency and jitter of reliable connections.
Our approach is feasible and beneficial in overlay networks that
do not have the scalability and interoperability requirements of
the global Internet.
The effects of the hop-by-hop reliability approach are quantified
in simulation as well as in practice using a newly developed
overlay network software that is fair with the external traffic
on the Internet. The experimental results show that
the overhead associated with overlay network processing at the
application level does not play an important factor compared with
the considerable gain of the approach.
A fundamental challenge in database replication is maintaining
a low cost of updates while assuring global system consistency.
The problem is magnified for wide-area replication due to the high latency
and the increased likelihood of network partitions. As a consequence,
most database replication research moved away from strictly consistent
models to update models with weaker semantics, relying on application
knowledge to resolve conflicts.
This paper explores a synchronous replication architecture for local
and wide-area networks that provides strong consistency and performs
considerably better than previous consistent approaches. As a proof of concept,
we implemented transparanet replication for the Postgres database system.
Our results show that sophisticated algorithms and careful distributed systems
design can make symmetric, synchronous, peer database replication a reality over
both local and wide-area networsk.
An ad hoc wireless network is an autonomous self-organizing system of
mobile nodes connected by wireless links where nodes not in direct
range can communicate via intermediate nodes. A common technique used
in routing protocols for ad hoc wireless networks is to establish the
routing paths on-demand, as opposed to continually maintaining a
complete routing table. A significant concern in routing is the
ability to function in the presence of byzantine failures which
include nodes that drop, modify, or mis-route packets in an attempt to
disrupt the routing service.
We propose an on-demand routing protocol for ad hoc wireless networks
that provides resilience to byzantine failures caused by individual or
colluding nodes. Our adaptive probing technique detects a malicious
link after log faults have occurred, where n is the length of
the path. These links are then avoided by multiplicatively increasing
their weights and by using an on-demand route discovery protocol that
finds a least weight path to the destination.
We present a protocol
that is analytically grounded, yet also achieves real world goals,
such as simplicity, fairness and minimal resource usage. We base our
flow control protocol on the Cost-Benefit algorithmic framework for
resource management. We base decisions on the "opportunity" costs of
network resources, comparing the cost of each individual resource to
the benefit it provides. As opposed to existing window-based flow
control schemes, we avoid end-to-end feedback by basing decisions on
the state of the links between participating nodes. This produces
control traffic proportional only to the number of overlay network
links and independent of the number of groups.
This paper presents the design of the transport protocols of the Spread wide area group communication system. We focus on two aspects of the system. First, the value of using overlay networks for application level group communication services. Second, the requirements and design of effective low latency link protocols used to construct wide area group communication.
This project develop the theory and algorithms required to overcome extremely strong network attacks, while providing theoretically provable performance bounds. We are building a system that incorporates these algorithms, and that exhibits good performance in practice.