Publications
2024
Tolerating Compound Threats in Critical Infrastructure Control Systems
Sahiti Bommareddy,
Maher Khan,
Huzaifah Nadeem,
Benjamin Gilby,
Imes Chu,
John W. van de Lindt,
Omar Nafal,
Mathaios Panteli,
Linon Wells II,
Yair Amir,
Amy Babay
Inproceedings of the 43rd International Symposium on Reliable Distributed Systems, Charlotte,USA, September 2024 (SRDS 2024).
We won best paper award.
Abstract
Compound threats, in which cyberattacks are tar- geted in the aftermath of a natural hazard, pose an important emerging threat for critical infrastructure. In this paper, we analyze the system design implications of compound threats for power grid SCADA systems for the first time. We introduce a novel compound threat model and develop a tool for analyzing resilience under this threat model. Using our tool, we compare the resilience of existing fault- and intrusion-tolerant SCADA system architectures in case studies based on two power utilities: Hawaiian Electric (HECO) and Florida Power & Light (FPL).
We show that no existing system architecture adequately addresses compound threats, but that it is possible to improve resilience to such threats by explicitly considering natural haz- ards in the system design and by employing a new out-of- band reconfiguration mechanism for intrusion-tolerant systems. However, an important outcome of our work is that compound threats remain a challenging problem, with no complete solution.
ByzSec — A Multi-layered Byzantine Resilient Architecture for Bulk Power System Protective Relays
Christopher Bonebrake,
Jonathan Sebastian-Cardenas,
Carl H. Miller,
Sahiti Bommareddy,
Yair Amir,
Kade Cornelison,
Cliff Eyre,
Paul Skare,
Sri Nikhil Gupta Gourisetti,
Aditya Ashok,
Bev Johnson
IEEE Power & Energy Society General Meeting , Seattle,USA, July 2024 (IEEE PESGM 2024).
Abstract
Reliability, selectivity, and sensitivity are the fundamental attributes of any protection system, acting as the main drivers in the selection of schemes, and equipment. In high-voltage systems, microprocessor-based relays represent the industry’s preferred solution, providing engineers with a vast array of benefits. However, they remain vulnerable to cybersecurity events that may compromise their functionality. To help mitigate against potential cybersecurity risks, this paper presents a fault-tolerant, Byzantine Resilient (BR) architecture that significantly increases the cybersecurity attributes of a protection system while minimizing the amount of performance impacts and integration overheads introduced. The solution relies on an array of independent relays that utilize robust consensus methods (based on Spire [1], [2]) to ensure correct system behavior is achieved even when a relay has been compromised. Furthermore, the solution has been complemented with a custom-built Situational Awareness engine that can be used to detect and identify potential threats. The implemented solution has been developed in consultation with three hardware vendors and has been tested to comply with the performance requirements of a 345kV differential protection scheme (87T). The results indicate that the proposed architecture is a comprehensive solution that: supports the strict correctness and performance requirements of the bulk power grid while providing a cost-effective alternative that offers a seamless, long-term solution.
2022
Real-Time Byzantine Resilient Power Grid Infrastructure: Evaluation and Trade-offs
Sahiti Bommareddy,
Maher Khan,
David J Sebastian Cardenas,
Carl Miller,
Christopher Bonebrake,
Yair Amir,
Amy Babay
Accepted at International Workshop on Explainability of Real-time Systems and their Analysis at the IEEE Real-Time Systems Symposium (RTSS 2022).
Abstract
Increasing threats to power grid infrastructure are driving the need to build Byzantine-resilient systems that can continue to operate correctly despite failures and attacks. How- ever, the real-time requirements of power grid infrastructure call for a more rigorous evaluation of Byzantine resilient systems than the traditional evaluations performed in the context of standard IT applications. We discuss these requirements, and the potential of commercial-off-the-shelf and open source solutions to support real-time resilient systems.
Real-Time Byzantine Resilience for Power Grid Substations
Sahiti Bommareddy,
Daniel Qian,
Christopher Bonebrake,
Paul Skare,
Yair Amir,
In Proceedings of the 41st International Symposium on Reliable Distributed Systems, Vienna, Austria, September 2022, pp. 135-144 SRDS.
Abstract
In the world of increasing cyber threats, a com- promised protective relay can put power grid resilience at risk by irreparably damaging costly power assets or by causing significant disruptions. We present the first architecture and protocols for the substation that ensure correct protective relay operation in the face of successful relay intrusions and network attacks while meeting the required latency constraint of a quarter power cycle (4.167ms).
Our architecture supports other rigid requirements, including continuous availability over a long system lifetime and seamless substation integration. We evaluate our implementation in a range of fault-free and faulty operation conditions, and provide deployment tradeoffs.
Data-Centric Analysis of Compound Threats to Critical Infrastructure Control Systems
Sahiti Bommareddy,
Benjamin Gilby,
Maher khan,
Imes Chiu,
Mathaios Panteli,
John W. van de Lindt,
Yair Amir,
Amy Babay
In Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) DSN-W.
Abstract
Compound threats involving cyberattacks that are targeted in the aftermath of a natural disaster pose an important emerging threat for critical infrastructure. We introduce a novel compound threat model and data-centric framework for evaluating the resilience of power grid SCADA systems to such threats. We present a case study of a compound threat involving a hurricane and follow-on cyberattack on Oahu Hawaii and analyze the ability of existing SCADA architectures to withstand this threat model. We show that no existing architecture fully addresses this threat model, and demonstrate the importance of considering compound threats in planning system deployments.
2021
RADICS: Runtime Assurance of Distributed Intelligent Control Systems
Brian Wheatman,
Jerry Chen,
Tamim Sookoor,
Yair Amir
In Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) 2021 DSN2021.
Abstract
We describe RADICS: Runtime Assurance of Dis- tributed Intelligent Control Systems, which combines a Simplex- based, black-box monitor with a white-box monitor to ensure correct behavior and good performance of AI systems. The black- box monitor allows the system to detect when the AI controller is on a failing trajectory and use a provably safe, but less performant algorithm, to right the system. The white-box monitor predicts when the AI controller will be put on such a trajectory before it happens and helps maximize the performance of the overall system. We describe the overall approach in detail and implement a simple version of it on a case study into controlling the lights in a small traffic grid.
2019
Deploying Intrusion-Tolerant SCADA for the Power Grid
Amy Babay,
John Schultz,
Thomas Tantillo,
Samuel Beckley,
Eamon Jordan,
Kevin Ruddell,
Kevin Jordan,
Yair Amir
In Proceedings of the IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN19), Portland OR, June 2019, pp. 328-335. Obsoletes
Technical Report CNDS-2019-1.
Abstract
While there has been considerable research on making power grid Supervisory
Control and Data Acquisition (SCADA) systems resilient to attacks, the problem
of transitioning these technologies into deployed SCADA systems remains largely
unaddressed.
We describe our experience and lessons learned in deploying an
intrusion-tolerant SCADA system in two realistic environments: a red-team
experiment in 2017 and a test deployment in a power plant in 2018. These
experiences resulted in technical lessons related to developing an
intrusion-tolerant system with a real deployable application, preparing a
system for deployment in a hostile environment, and supporting protocol
assumptions in that hostile environment. We also discuss some meta-lessons
regarding the cultural and interpersonal aspects of transitioning
academic research into practice in the power industry.
2018
Toward an Intrusion-Tolerant Power Grid: Challenges and Opportunities
Amy Babay,
John Schultz,
Thomas Tantillo,
Yair Amir
In Proceedings of the IEEE International Conference on Distributed Computing Systems
(ICDCS), Vision track, Vienna Austria, July 2018, pp. 1321-1326. Invited paper.
Abstract
While cyberattacks pose a relatively new challenge for power grid control systems, commercial cloud systems have needed to address similar threats for many years. However, technology and approaches developed for cloud systems do not necessarily transfer directly to the power grid, due to important differences between the two domains. We discuss our experience adapting intrusion-tolerant cloud technologies to the power domain and describe the challenges we have encountered and potential directions for overcoming those obstacles.
Network-Attack-Resilient Intrusion-Tolerant SCADA for the Power Grid
Amy Babay,
Thomas Tantillo,
Trevor Aron,
Marco Platania,
Yair Amir
In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN18), Luxembourg, June 2018, pp. 255-266. Obsoletes Technical Report CNDS-2017-2.
Abstract
As key components of the power grid infrastructure, Supervisory Control and Data Acquisition (SCADA) systems are likely to be targeted by nation-state-level attackers willing to invest considerable resources to disrupt the power grid. We present Spire, the first intrusion-tolerant SCADA system that is resilient to both system-level compromises and sophisticated network-level attacks and compromises. We develop a novel architecture that distributes the SCADA system management across three or more active sites to ensure continuous availability in the presence of simultaneous intrusions and network attacks. A wide-area deployment of Spire, using two control centers and two data centers spanning 250 miles, delivered nearly 99.999% of all SCADA updates initiated over a 30-hour period within 100ms. This demonstrates that Spire can meet the latency requirements of SCADA for the power grid.
2017
Structured Overlay Networks for a New Generation of Internet Services
Amy Babay,
Claudiu Danilov,
John Lane,
Michal Miskin-Amir,
Daniel Obenshain,
John Schultz,
Jonathan Stanton,
Thomas Tantillo,
Yair Amir
In Proceedings of the IEEE International Conference on Distributed Computing Systems
(ICDCS), Vision track, Atlanta GA, June 2017, pp. 1771-1779. Invited paper.
Abstract
The dramatic success and scaling of the Internet was made possible by the core
principle of keeping it simple in the middle and smart at the edge (or the
end-to-end principle). However, new applications bring new demands, and for
many emerging applications, the Internet paradigm presents limitations.
For applications in this new generation of Internet services, structured
overlay networks offer a powerful framework for deploying specialized protocols
that can provide new capabilities beyond what the Internet natively supports by
leveraging global state and in-network processing. The structured overlay
concept includes three principles: A resilient network architecture, a flexible
overlay node software architecture that exploits global state and unlimited
programmability, and flow-based processing.
We demonstrate the effectiveness of structured overlay networks in supporting
today's demanding applications and propose forward-looking ideas for leveraging
the framework to develop protocols that push the boundaries of what is possible
in terms of performance and resilience.
Timely, Reliable, and Cost-effective Internet Transport Service using Dissemination Graphs
Amy Babay,
Emily Wagner,
Michael Dinitz, and
Yair Amir
In Proceedings of the IEEE International Conference on Distributed Computing Systems
(ICDCS), Atlanta GA, June 2017, pp. 1-12. Obsoletes Technical Report CNDS-2017-1.
Abstract
Emerging applications such as remote manipulation and remote robotic
surgery require communication that is both timely and reliable, but the
Internet natively supports only communication that is either completely
reliable with no timeliness guarantees (e.g. TCP) or timely with
best-effort reliability (e.g. UDP). We present an overlay transport service that can
provide highly reliable communication while meeting stringent timeliness
guarantees (e.g. 130ms round-trip latency across the US) over the Internet. To
enable routing schemes that can support the necessary timeliness and
reliability, we introduce dissemination graphs, providing a unified framework
for specifying routing schemes ranging from a single path, to multiple disjoint
paths, to arbitrary graphs. We conduct an extensive analysis of real- world
network data, finding that a routing approach using two disjoint paths performs
well in most cases, and that cases where two disjoint paths do not perform well
typically involve problems around a source or destination. Based on this
analysis, we develop a timely dissemination-graph-based routing method that can
add targeted redundancy in problematic areas of the network. This approach can
cover over 99% of the performance gap between a traditional single-path
approach and an optimal (but prohibitively expensive) scheme, while two dynamic
disjoint paths cover about 70% of this gap, and two static disjoint paths cover
about 45%. This performance improvement is obtained at a cost increase of about
2% over two disjoint paths.
2016
Practical Intrusion-Tolerant Networks
Daniel Obenshain,
Thomas Tantillo,
Amy Babay,
John Schultz,
Andrew Newell,
Md. Endadul Hoque,
Yair Amir,
Cristina Nita-Rotaru
In Proceedings of the IEEE International Conference on Distributed
Computing Systems (ICDCS), Nara, Japan, June 2016, pp. 45-56.
Obsoletes Technical Report CNDS-2016-2.
Abstract
As the Internet becomes an important part of the infrastructure our
society depends on, it is crucial to construct networks that are able to work
even when part of the network is compromised. This paper presents the first
practical intrusion-tolerant network service, targeting high-value applications
such as monitoring and control of global clouds and management of critical
infrastructure for the power grid. We use an overlay approach to leverage the
existing IP infrastructure while providing the required resiliency and
timeliness. Our solution overcomes malicious attacks and compromises in both
the underlying network infrastructure and in the overlay itself. We deploy and
evaluate the intrusion-tolerant overlay implementation on a global cloud
spanning East Asia, North America, and Europe, and make it publicly available.
Fast Total Ordering for Modern Data Centers
Amy Babay and
Yair Amir
In Proceedings of the IEEE International Conference on Distributed
Computing Systems (ICDCS), Nara, Japan, June 2016, pp. 669-679. An
extended version is available as Technical Report CNDS-2016-1.
Obsoletes Technical Report CNDS-2014-2.
Abstract
The performance profile of local area networks has changed over the last
decade, but many practical group communication and ordered messaging tools rely
on core ideas invented over a decade ago. We present the Accelerated Ring
protocol, a novel ordering protocol that improves on the performance of
standard token-based protocols by allowing processes to pass the token before
they have finished multicasting. This performance improvement is obtained while
maintaining the correctness and other beneficial properties of token-based
protocols.
On 1-gigabit networks, a single-threaded daemon-based implementation of the
protocol reaches network saturation, and can reduce latency by 45% compared to
a standard token-based protocol while simultaneously increasing throughput by
30%. On 10-gigabit networks, the implementation reaches throughputs of 6 Gbps,
and can reduce latency by 30-35% while simultaneously increasing throughput by
25-40%. A production implementation of the Accelerated Ring protocol has been
adopted as the default ordering protocol for data center environments in
Spread, a widely-used open-source group communication system.
On Choosing Server- or Client-Side Solutions for BFT
Marco Platania,
Daniel Obenshain,
Thomas Tantillo,
Yair Amir,
Neeraj Suri
ACM Computing Surveys (CSUR), 48(4), Article 61, May 2016.
Abstract
Byzantine Fault Tolerant (BFT) protocols have the ability to work
correctly even when up to a threshold f of system servers
are compromised. This makes them appealing for the construction of
critical systems connected to the Internet, which are constantly a
target for cyber attacks.
BFT protocols differ based on the kind of application, deployment
settings, performance, access control mechanisms, number of servers
in the system, and protocol implementation. The large number of
protocols present in the literature and their differences make it
difficult for a system builder to choose the solution that best
satisfies the requirements of the system that he wants to build. In
particular, the main difference among BFT protocols lies in their
system models: server-side versus client-side. In the server-side
model each client relies on the system to consistently order and
replicate updates, while in the client-side model each client
actively participates in the protocol.
In this article, we classify BFT protocols as server-side or
client-side. We analyze the trade-offs between the two models,
describe systems that use these models and the trade-offs they
choose, highlight the research gaps, and provide guidelines to
system builders in order to choose the solution that best satisfies
their needs.
2015
Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes
Andrew Newell,
Daniel Obenshain,
Thomas Tantillo,
Cristina Nita-Rotaru,
Yair Amir
The IEEE Transactions on Dependable and Secure Computing (TDSC), 12(6), pages 602-614, November 2015.
Abstract
Networks with homogeneous routing nodes are constantly at risk as
any vulnerability found against a node could be used to compromise
all nodes. Introducing diversity among nodes can be used to address
this problem. With few variants, the choice of assignment of
variants to nodes is critical to the overall network resiliency.
We present the Diversity Assignment Problem (DAP), the assignment
of variants to nodes in a network, and we show how to compute the
optimal solution in medium-size networks. We also present a greedy
approximation to DAP that scales well to large networks. Our
solution shows that a high level of overall network resiliency can
be obtained even from variants that are weak on their own.
We provide a variation of our problem that matches the specific
communication requirements of applications run over the network
(e.g., Paxos and BFT). Also, we analyze the loss in resiliency when
optimally assigning variants based on inaccurate information about
compromises.
Toward Survivable Intrusion-Tolerant Open-Source SCADA
Thomas Tantillo
IEEE International Conference on Dependable Systems and Networks (DSN15) Student Forum, Rio de Janeiro, June 2015.
Abstract
As vital components of critical infrastructure, SCADA systems must
continue to operate correctly and at their expected level of performance at all
times. However, current SCADA systems are vulnerable to intrusions, and even a
single compromise can cause catastrophic consequences. We present the
architecture of and initial steps toward the first intrusion-tolerant
open-source SCADA system that is survivable over the required long system
lifetimes. We perform a case study of a hypothetical deployment on the Eastern
Seaboard of the United States to validate that our SCADA system will meet SCADA
application latency requirements.
Timely, Reliable, and Cost-effective Transport Service using Dissemination Graphs
Amy Babay
IEEE International Conference on Dependable Systems and Networks (DSN15) Student Forum, Rio de Janeiro, June 2015.
Abstract
We present preliminary work that demonstrates the feasibility of
deploying an Internet transport service that can support applications with
stringent timeliness and reliability requirements (e.g. 130ms round-trip
latency across the US with 99.999% reliability). We describe an approach to
building such a transport service based on overlay networks and dissemination
graphs. In this approach, each packet is sent over a subgraph of the overlay
topology (a dissemination graph) that is chosen based on reliability, latency,
and cost requirements.
2014
Towards a Practical Survivable Intrusion Tolerant Replication System
Marco Platania,
Daniel Obenshain,
Thomas Tantillo,
Ricky Sharma,
Yair Amir
In the Proceedings of the IEEE International Symposium on
Reliable Distributed Systems (SRDS14), Nara, Japan, October
2014, pp. 242-252. An extended version is available as Technical Report CNDS-2014-1.
Abstract
The increasing number of cyber attacks against critical infrastructures, which
typically require large state and long system lifetimes, necessitates the
design of systems that are able to work correctly even if part of them is
compromised.
We present the first practical survivable intrusion tolerant replication
system, which defends across space and time using compiler-based diversity and
proactive recovery, respectively. Our system supports large-state applications,
and utilizes the
Prime BFT
protocol (providing performance guarantees under attack) with a
compiler-based diversification engine. We devise a novel theoretical model that
computes how resilient the system is over its lifetime based on the
rejuvenation rate and the number of replicas.
This model shows that we can achieve a confidence in the system of 95% over 30
years even when we transfer a state of 1 terabyte after each rejuvenation.
Survivable SCADA via Intrusion-Tolerant Replication
Jonathan Kirsch,
Stuart Goose,
Yair Amir,
Dong Wei,
Paul Skare
IEEE Transactions on Smart Grid, 5(1), pages 60-70, January 2014.
Abstract
Providers of critical infrastructure services strive to maintain the high
availability of their SCADA systems. This paper reports on our experience
designing, architecting, and evaluating the first survivable SCADA system
- one that is able to ensure correct behavior with minimal performance
degradation even during cyber attacks that compromise part of the system. We
describe the challenges we faced when integrating modern intrusion-tolerant
protocols with a conventional SCADA architecture and present the techniques
we developed to overcome these challenges. The results illustrate that our
survivable SCADA system not only functions correctly in the face of a cyber
attack, but that it also processes in excess of 20,000 messages per second
with a latency of less than 30 ms, making it suitable for even large-scale
deployments managing thousands of remote terminal units.
2013
Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes
Andrew Newell,
Daniel Obenshain,
Thomas Tantillo,
Cristina Nita-Rotaru,
Yair Amir
Technical Report CNDS-2013-1.
Abstract
Networks with homogeneous routing nodes are constantly at risk as any vulnerability found
against a node could be used to compromise all nodes. Introducing diversity
among nodes can be used to address this problem. With few variants, the choice of assignment
of variants to nodes is critical to the overall network resiliency.
We present the Diversity Assignment Problem (DAP), the assignment of
variants to nodes in a network, and we show how to compute the optimal solution in medium-size
networks. We also present a greedy approximation to DAP that scales well to large networks.
Our solution shows that a high level of overall network resiliency can be obtained even
from variants that are weak on their own.
We provide two variations of our problem to meet real-world system needs.
First, for networks with knowledge of higher-level protocols we offer a technique to create
assignments that maximize the needs of a specific application (e.g., Paxos and BFT). Second,
for networks with knowledge of the value of traffic between each communicating pair of nodes,
we offer a weighted version that can increase resiliency between important communicating pairs
while sacrificing resiliency for the less important pairs.
Our assignments are based on assumed compromise probabilities and independence of compromises
between different diverse variants. We provide analysis when these assumed probabilities or
independence are inaccurate.
Increasing Network Resiliency by Optimally Assigning Diverse Variants to Routing Nodes
Andrew Newell,
Daniel Obenshain,
Thomas Tantillo,
Cristina Nita-Rotaru,
Yair Amir
In the Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN13), Budapest, June 2013.
Obsoletes Technical Report TR-13-002.
Abstract
Networks with homogeneous routing nodes are
constantly at risk as any vulnerability found against a node could
be used to compromise all nodes. Introducing diversity among
nodes can be used to address this problem. With few variants,
the choice of assignment of variants to nodes is critical to the
overall network resiliency.
We present the Diversity Assignment Problem (DAP), the
assignment of variants to nodes in a network, and we show how
to compute the optimal solution in medium-size networks. We
also present a greedy approximation to DAP that scales well to
large networks. Our solution shows that a high level of overall
network resiliency can be obtained even from variants that are
weak on their own.
For real-world systems that grow incrementally over time, we
provide an online version of our solution. Lastly, we provide a
variation of our solution that is tunable for specific applications
(e.g., BFT).
2012
Intrusion-Tolerant Cloud Monitoring and Control
Daniel Obenshain,
Tom Tantillo,
Andrew Newell,
Cristina Nita-Rotaru,
Yair Amir
In Proceedings of the 2012 Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2012), Madeira, Portugal, July 2012.
Invited paper.
Abstract
We propose an intrusion-tolerant
overlay messaging service for cloud monitoring and control.
Our service, Controlled Authenticated Overlay Flooding,
combines authentication mechanisms and
flooding with an
overlay architecture to provide optimal resiliency in the presence
of compromised nodes. While Internet flooding is not
feasible because of the number of nodes and links involved,
our approach is practical as it imposes a maximal overlay
topology and confines the communication to that topology's
overlay links, limiting the cost of flooding.
2011
Toward Survivable SCADA
Jonathan Kirsch,
Stuart Goose,
Yair Amir,
Paul Skare
In Proceedings of Annual Cyber Security and Information Intelligence Research Workshop (CSIIRW11),
Oak Ridge TN, October 2011.
Abstract
This paper reports on our experience designing and implementing the first
survivable SCADA system - one capable of providing correct behavior with minimal
performance degradation even during a cyber attack that compromises part of the system.
We describe the challenges we faced while attempting to integrate modern
intrusion-tolerant protocols with the SCADA architecture and present the techniques
we developed to overcome these challenges.
Prime: Byzantine Replication Under Attack
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
IEEE Transactions on Dependable and Secure Computing (TDSC), 8(4), pages 564-577, July 2011.
Abstract
Existing Byzantine-resilient replication protocols satisfy two standard correctness criteria,
safety and liveness, even in the presence of Byzantine faults. The runtime performance of these
protocols is most commonly assessed in the absence of processor faults and is usually good
in that case. However, in some protocols faulty processors can significantly degrade performance,
limiting the practical utility of these protocols in adversarial environments. This paper
demonstrates the extent of performance degradation possible in some existing protocols that
do satisfy liveness and that do perform well absent Byzantine faults. We propose a new
performance-oriented correctness criterion that requires a consistent level of performance,
even when the system exhibits Byzantine faults. We present a new Byzantine fault-tolerant
replication protocol that meets the new correctness criterion and evaluate its performance in
fault-free executions and when under attack.
2010
The SMesh Wireless Mesh Network
Yair Amir,
Claudiu Danilov,
Raluca Musaloiu-Elefteri,
Nilo Rivera
The ACM Transactions on Computer Systems (ACM TOCS), 28(3), pages 6:1-6:49,
September 2010.
Obsoletes Technical Report CNDS-2009-3.
Abstract
Wireless mesh networks extend the connectivity range of mobile devices
by using multiple access points, some of them connected to the
Internet, to create a mesh topology and forward packets over multiple
wireless hops. However, the quality of service provided by the mesh is
impaired by the delays and disconnections caused by handoffs, as
clients move within the area covered by multiple access points. We
present the architecture and protocols of SMesh, the first transparent
wireless mesh system that offers seamless, fast handoff, supporting
real-time applications such as interactive VoIP. The handoff and
routing logic is done solely by the access points, and therefore
connectivity is attainable by any 802.11 device. In SMesh, the entire
mesh network is seen by the mobile clients as a single, omnipresent
access point, giving the mobile clients the illusion that they are
stationary. We use multicast for access points coordination and,
during handoff transitions, we use more than one access point to
handle the moving client. SMesh provides a hybrid routing protocol
that optimizes routes over wireless and wired links in a multi-homed
environment. Experimental results on a fully deployed mesh network
demonstrate the effectiveness of the SMesh architecture and its
intra-domain and inter-domain handoff protocols.
A Robust Push-to-Talk Service for Wireless Mesh Networks
Yair Amir,
Raluca Musaloiu-Elefteri,
Nilo Rivera
In Proceedings of the 7th IEEE Conference on
Sensor, Mesh and Ad Hoc Communications and Networks (IEEE SECON 2010), pages 270-278, Boston MA, June 2010.
Abstract
Push-to-Talk (PTT) is a useful capability for rapidly deployable wireless
mesh networks used by first responders. PTT allows several users to speak
with each other while using a single, half-duplex, communication channel,
such that only one user speaks at a time while all other users listen.
This paper presents the architecture and protocol of a robust distributed
PTT service for wireless mesh networks. The architecture supports any
802.11 client with SIP-based (Session Initiation Protocol) VoIP software
and enables the participation of regular phones. Collectively, the mesh
nodes provide the illusion of a single third party call controller,
enabling clients to participate via any reachable mesh node. Each PTT
group instantiates its own logical floor control manager that is highly
available and resilient to mesh connectivity changes such as node crashes
and recoveries and network partitions and merges. Experimental results
on a fully deployed mesh network consisting of 14 mesh nodes and tens of
emulated clients demonstrate the scalability and robustness of the system.
STEWARD: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks
Yair Amir,
Claudiu Danilov,
Danny Dolev,
Jonathan Kirsch,
John Lane,
Cristina Nita-Rotaru,
Josh Olsen,
David Zage
IEEE Transactions on Dependable and Secure Computing (TDSC) 7(1), pages 80-93, Januray 2010.
Abstract
This paper presents the first hierarchical Byzantine
fault-tolerant replication architecture suitable to systems that
span multiple wide area sites. The architecture confines the
effects of any malicious replica to its local site, reduces message
complexity of wide area communication, and allows read-only
queries to be performed locally within a site for the price
of additional standard hardware. We present proofs that our
algorithm provides safety and liveness properties. A prototype
implementation is evaluated over several network topologies
and is compared with a flat Byzantine fault-tolerant approach.
The experimental results show considerable improvement over
flat Byzantine replication algorithms, bringing the performance
of Byzantine replication closer to existing benign fault-tolerant
replication techniques over wide area networks.
2009
An Attack-Resilient Architecture for Large-Scale Intrusion-Tolerant Replication
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
Technical Report CNDS-2009-5.
Abstract
This paper presents the first architecture for large-scale, wide-area
intrusion-tolerant state machine replication that is specifically
designed to perform well even when some of the servers are Byzantine.
The architecture is hierarchical and runs attack-resilient state
machine replication protocols within and among the wide-area sites.
Given the constraints of the wide-area environment, we explore the
challenges and tradeoffs of building inter-site communication
protocols that use wide-area bandwidth efficiently yet can resist
attempts to degrade performance. The paper provides evidence that the
optional use of simple dependable components, whose compromise or
malfunction cannot cause inconsistency in the replicated service, can
significantly improve performance when the system is under attack.
Prime: Byzantine Replication Under Attack
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
Technical Report CNDS-2009-4.
Abstract
Existing Byzantine-resilient replication protocols satisfy two
standard correctness criteria, safety and liveness, in the presence of
Byzantine faults. In practice, however, faulty processors can, in
some protocols, significantly degrade performance by causing the
system to make progress at an extremely slow rate. While ``correct''
in the traditional sense, systems vulnerable to such performance
degradation are of limited practical use in adversarial environments.
This paper argues that techniques for mitigating such performance
attacks are needed to bridge this ``practicality gap'' for
intrusion-tolerant replication systems. We propose a new
performance-oriented correctness criterion, and we show how failure to
meet this criterion can lead to performance degradation. We present a
new Byzantine replication protocol that achieves the criterion and
evaluate its performance in fault-free configurations and when under
attack.
Authenticated Adversarial Routing
Yair Amir,
Paul Bunn,
Rafail Ostrovsky
In Proceedings of the IACR Theory of Cryptograhy Conference (TCC), pages 163-182, San Francisco CA, March 2009.
Abstract
The aim of this paper is to demonstrate the feasibility of authenticated throughput-efficient
routing in an unreliable and dynamically changing synchronous network in which the majority
of malicious insiders try to destroy and alter messages or disrupt communication in any way.
More specifically, in this paper we seek to answer the following questoin: Given a network
in which the majority of nodes are controlled by a node-controlling adversary and whose
topology is changing every round, is it possible to develop a protocol with polynomially-bounded
memory per processor that guarantees throughput-efficient and correct end-to-end communication?
We answer the question affirmatively for extremely general corruption patterns: We only request
that the topology of the network and the corruption pattern of the adversary leaves at least
one path each round connecting the sender and receiver through honest nodes (though this
path may change at every round).
Intrusion-Tolerant Group Management for Mobile Ad-Hoc Networks
Jonathan Kirsch,
Brian Coan
Technical Report CNDS-2009-2.
Abstract
This paper presents PICO, a generic infrastructure for secure group
communication in mobile ad-hoc networks (MANETs). PICO provides an
intrusion-tolerant group management service, allowing clients to join
or leave a logical group and enabling group members to communicate
securely using a dynamically generated group encryption key. Since
MANETs are characterized by relatively high message loss and frequent
network partitions, PICO is built around a new Byzantine
fault-tolerant agreement protocol designed to cope with these
conditions. The agreement protocol leverages weak (commutative)
semantics to allow multiple partitions to continue operating in
parallel without sacrificing correctness, and it uses threshold
cryptography to provide efficient reconciliation and coordination
without the need for reliable communication links.
A Robust Push-to-Talk Service for Wireless Mesh Networks
Yair Amir,
Raluca Musaloiu-Elefteri,
Nilo Rivera
Technical Report CNDS-2009-1.
Abstract
Push-to-Talk (PTT) is a useful capability for rapidly deployable
wireless mesh networks used by first responders. PTT allows several
users to speak with each other while using a single, half-duplex,
communication channel, such that only one user speaks at a time while
all other users listen. Furthermore, enabling regular PSTN phone users
(e.g., cell phones) to seamlessly participate in the wireless mesh PTT
session is key to supporting the heterogeneous environment commonly
found in such settings.
This paper presents the architecture and protocol of a distributed PTT
service for wireless mesh networks. The architecture supports any 802.11
client with SIP-based VoIP software and enables the participation of
regular phones. Collectively, the mesh nodes provide the illusion of a
single third party call controller, enabling clients to participate via
any reachable mesh node. Each PTT group instantiates its own logical
floor control manager that is highly available and resilient to mesh
connectivity changes such as node crashes and recoveries and network
partitions and merges.
Experimental results on a fully deployed mesh network consisting of 14
mesh nodes and tens of emulated clients demonstrate the scalability and
robustness of the system.
2008
Paxos For System Builders: An Overview
Yair Amir,
Jonathan Kirsch
In Proceedings of the 2008 Workshop on Large-Scale Distributed Systems and Middleware (LADIS 2008), Yorktown, NY, September 2008.
Invited paper.
Abstract
This paper presents an overview of Paxos for System Builders, a
complete specification of the Paxos replication protocol such that
system builders can understand it and implement it. We evaluate the
performance of a prototype implementation and detail the safety and
liveness properties guaranteed by our specification of Paxos.
On Redundant Multipath Operating System Support for Wireless Mesh Networks
Yair Amir,
Claudiu Danilov,
Michael Kaplan,
Raluca Musaloiu-Elefteri,
Nilo Rivera
In the 3rd IEEE Workshop on Wireless Mesh Networks (WiMesh 2008), San Francisco, CA, USA, June 2008, pp. 1-6.
Obsoletes Technical Report CNDS-2008-3, CNDS-2007-2.
Abstract
Low-cost wireless routers are changing the way people connect to
the Internet. They are also very cheap, albeit quite limited, Linux
boxes. These attributes make them ideal candidates for wireless mesh
routers.
This paper presents a minimally invasive mechanism for redundant multipath
routing in kernel-space to achieve high reliability with high throughput
in a mesh network. This service is essential for achieving fast,
lossless handoff as mobile devices roam throughout the wireless mesh
coverage area. However, redundant multipath is not natively supported by
current operating systems, limiting the routing mechanisms that can be
used in these networks to user-level implementations, which can greatly
degrade performance.
We show an architecture that integrates this mechanism in a wireless
mesh system, resulting in a high-throughput 802.11 mesh network with
fast handoff over low-cost routers.
Byzantine Replication Under Attack
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN 2008), Anchorage, AK, USA, June 2008, pp. 197-206. Obsoletes Technical Report
CNDS-2008-1.
Abstract
Existing Byzantine-resilient replication protocols satisfy two
standard correctness criteria, safety and liveness, in the presence of
Byzantine faults. In practice, however, faulty processors can, in
some protocols, significantly degrade performance by causing the
system to make progress at an extremely slow rate. While ``correct''
in the traditional sense, systems vulnerable to such performance
degradation are of limited practical use in adversarial environments.
This paper argues that techniques for mitigating such performance
attacks are needed to bridge this ``practicality gap'' for
intrusion-tolerant replication systems. We propose a new
performance-oriented correctness criterion, and we show how failure to
meet this criterion can lead to performance degradation. We present a
new Byzantine replication protocol that achieves the criterion and
evaluate its performance in fault-free configurations and when under
attack.
Paxos For System Builders
Jonathan Kirsch,
Yair Amir
Technical Report CNDS-2008-2.
Abstract
This paper presents a complete specification of the Paxos replication
protocol such that system builders can understand it and implement it.
We evaluate the performance of a prototype implementation and detail
the safety and liveness properties guaranteed by our specification of
Paxos.
2007
Customizable Fault Tolerance For Wide-Area Replication
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
In the 26th IEEE International Symposium on Reliable Distributed Systems
(SRDS 2007), Beijing, China, 2007, pp. 66-80.
Abstract
Constructing logical machines out of collections of physical machines
is a well-known technique for improving the robustness and fault
tolerance of distributed systems. We present a new, scalable
replication architecture, built upon logical machines specifically
designed to perform well in wide-area systems spanning multiple sites.
The physical machines in each site implement a logical machine by
running a local state machine replication protocol, and a wide-area
replication protocol runs among the logical machines. Implementing
logical machines via the state machine approach affords free
substitution of the fault tolerance method used in each site and in
the wide-area replication protocol, allowing one to balance
performance and fault tolerance based on perceived risk.
We present a new Byzantine fault-tolerant protocol that establishes a
reliable virtual communication link between logical machines. Our
communication protocol is efficient (a necessity in wide-area
environments), avoiding the need for redundant message sending during
normal-case operation and allowing a logical machine to consume
approximately the same wide-area bandwidth as a single physical
machine. This dramatically improves the wide-area performance of our
system compared to existing logical machine based approaches. We
implemented a prototype system and compare its performance and fault
tolerance to existing solutions.
Customizable Fault Tolerance For Wide-Area Replication
Yair Amir,
Brian Coan,
Jonathan Kirsch,
John Lane
Technical Report CNDS-2007-1. Obsoletes Technical Report
CNDS-2006-3.
Abstract
Constructing logical machines out of collections of physical machines
is a well-known technique for improving the robustness and fault
tolerance of distributed systems. We present a new, scalable
replication architecture, built upon logical machines specifically
designed to perform well in wide-area systems spanning multiple
sites. The physical machines in each site implement a logical machine
by running a local state machine replication protocol, and a wide-area
replication protocol runs among the logical machines. Implementing
logical machines via the state machine approach affords free
substitution of the fault tolerance method used in each site and in
the wide-area replication protocol, allowing one to balance
performance and fault tolerance based on perceived risk. We present a
new Byzantine fault-tolerant protocol that establishes a reliable
virtual communication link between logical machines. Our communication
protocol is efficient (a necessity in wide-area environments),
avoiding the need for redundant message sending during normal-case
operation and allowing a logical machine to consume approximately the
same wide-area bandwidth as a single physical machine. This
dramatically improves the wide-area performance of our system compared
to existing logical machine based approaches. We implemented a
prototype system and compare its performance and fault tolerance to
existing solutions.
An Inter-domain Routing Protocol for Multi-homed Wireless Mesh Networks
Yair Amir,
Claudiu Danilov,
Raluca Musaloiu-Elefteri,
Nilo Rivera
In IEEE International Symposium on a World of Wireless, Mobile, and
Multimedia Networks (WoWMoM 2007), Helsinki, Finland.
Obsoletes Technical Report
CNDS-2006-1.
Abstract
This paper presents a routing protocol for multi-homed wireless mesh
networks that provide uninterrupted connectivity and fast handoff.
Our approach integrates wireless and wired connectivity, using multicast
groups to coordinate decisions and seamlessly transfer connections
between several Internet gateways as mobile clients move between
access points. The protocol optimizes the use of the wireless medium
by short-cutting wireless hops through wired connections, paying a very
low overhead during handoffs. The paper demonstrates that inter-domain
handoffs occur instantaneously, with virtually no loss or delay, for
both TCP and UDP connections.
2006
STEWARD: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks
Yair Amir,
Claudiu Danilov,
Danny Dolev,
Jonathan Kirsch,
John Lane,
Cristina Nita-Rotaru,
Josh Olsen,
David Zage
Technical Report CNDS-2006-2. Obsoletes Technical Report
CNDS-2005-3.
Abstract
This paper presents the first hierarchical Byzantine
fault-tolerant replication architecture suitable to systems that
span multiple wide area sites. The architecture confines the
effects of any malicious replica to its local site, reduces message
complexity of wide area communication, and allows read-only
queries to be performed locally within a site for the price
of additional standard hardware. We present proofs that our
algorithm provides safety and liveness properties. A prototype
implementation is evaluated over several network topologies
and is compared with a flat Byzantine fault-tolerant approach.
The experimental results show considerable improvement over
flat Byzantine replication algorithms, bringing the performance
of Byzantine replication closer to existing benign fault-tolerant
replication techniques over wide area networks.
Fast Handoff for Seamless Wireless Mesh Networks
Yair Amir,
Claudiu Danilov,
Michael Hilsdale,
Raluca Musaloiu-Elefteri,
Nilo Rivera
In the Proceedings of the ACM International Conference on Mobile Systems, Applications and Services (MobiSys 2006), pages 83-95, Uppsala, Sweden, June 2006. Obsoletes Technical Report
CNDS-2005-2.
Abstract
This paper presents the architecture and protocols of SMesh,
a completely transparent wireless mesh system that offers
seamless, fast handoff, supporting VoIP and other real-time
application traffic for any unmodified 802.11 device.
In SMesh, the entire mesh network is seen by the
mobile clients as a single, omnipresent access point.
Fast handoff is achieved by ensuring that each client is served
by at least one access point at any time. Mobile clients are
handled by a single access point during stable connectivity times.
During handoff transitions, SMesh uses more than one access point
to handle the moving client. Access points continuously monitor
the connectivity quality of any client in their range and
efficiently share this information with other access points in
the vicinity of that client to coordinate which of them should
serve the client.
Experimental results on a fully deployed mesh network consisting
of 14 access points demonstrate the effectiveness of the SMesh
architecture and its handoff protocol.
Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks
Yair Amir,
Claudiu Danilov,
Danny Dolev,
Jonathan Kirsch,
John Lane,
Cristina Nita-Rotaru,
Josh Olsen,
David Zage
In the Proceedings of the IEEE International Conference on
Dependable Systems and Networks (DSN06), pages 105-114, Philadelphia, June 2006.
Abstract
This paper presents the first hierarchical Byzantine fault-tolerant
replication architecture suitable to systems that span multiple wide
area sites. The architecture confines the effects of any malicious
replica to its local site, reduces message complexity of wide area
communication, and allows read-only queries to be performed locally
within a site for the price of additional hardware. A prototype
implementation is evaluated over several network topologies and is
compared with a flat Byzantine tolerant approach.
An Overlay Architecture for High Quality VoIP Streams
Yair Amir,
Claudiu Danilov,
Stuart Goose,
David Hedqvist,
Andreas Terzis
In the IEEE Transactions on Multimedia, 8(6), pages 1250-1262, December 2006.
Abstract
The cost savings and novel features associated with Voice over IP
(VoIP) are driving its adoption by service providers.
Unfortunately, the Internet's best effort service model provides no
quality of service guarantees. Because low latency and jitter is the key
requirement for supporting high quality interactive conversations,
VoIP applications use UDP to transfer data, thereby subjecting
themselves to quality degradations caused by packet loss and network
failures.
In this paper we describe an architecture to improve the performance
of such VoIP applications. Two protocols are used for localized packet
loss recovery and rapid rerouting in the event of network
failures. The protocols are deployed on the nodes of an
application-level overlay network and require no changes to the
underlying infrastructure. Experimental results indicate that the
architecture and protocols can be combined to yield voice quality on
par with the PSTN.
2005
Enhancing Distributed Systems with Mechanisms to Cope with Malicious Clients
Yair Amir,
Claudiu Danilov,
John Lane,
Michal Miskin-Amir,
Cristina Nita-Rotaru
Technical Report CNDS-2005-4.
Abstract
In this paper we identify a major security vulnerability in distributed
systems: compromised clients under adversarial control can use the
system within their authorized access rights and authenticated
channels to deliberately insert incorrect data. A significant problem
is that when
a malicious client insider is discovered, it is hard to quickly assess
the scope of the damage, and identify corrupt and suspected updates.
We propose Accountability Graph, a mechanism
that can assist applications in coping and recovering from such attacks.
The tool provides accountability enforcement and causality tracking of
updates and their dependencies. Upon detection of incorrect data (e.g.
by an external intrusion detection mechanism or human assessment),
the Accountability Graph will quickly classify all updates in the system
as either corrupted, suspected or not affected. The practicality and
usefulness of the approach is demonstrated based on the requirements
of three different applications: an open source software development project,
a military common operation picture application, and a national emergency
response system. The Accountability Graph can also be used for risk
assessment and vulnerability analysis with respect to the above attack.
A Cost-Benefit Flow Control for Reliable Multicast and Unicast in Overlay Networks
Yair Amir,
Baruch Awerbuch,
Claudiu Danilov,
Jonathan Stanton.
In IEEE/ACM Transactions on Networking, 13(5), pages 1094-1106, October 2005.
Abstract
When many parties share network resources on an
overlay network, mechanisms must exist to allocate the resources
and protect the network from overload. Compared to large
physical networks such as the Internet, in overlay networks the
dimensions of the task are smaller, so new and possibly more
effective techniques can be used. In this work we take a fresh
look at the problem of flow control in multi-sender multi-group
reliable multicast and unicast and explore a cost-benefit approach
that works in conjunction with Internet standard protocols such
as TCP.
In contrast to existing window-based flow control schemes we
avoid end-to-end per sender or per group feedback by looking
only at the state of the virtual links between participating nodes.
This produces control traffit proportional only to the number of
overlay network links and independent of the number of groups,
senders or receivers. We show the effectiveness of the resulting
protocol through simulations and validate the simulations with
live Internet experiments. We demonstrate near optimal utilization
of network resources, fair sharing of individual congested
links and quick adaptation to network changes.
An Overlay Architecture for High Quality VoIP Streams
Yair Amir,
Claudiu Danilov,
Stuart Goose,
David Hedqvist,
Andreas Terzis
Technical Report CNDS-2005-1. Obsoletes Technical Report CNDS-2004-2.
Abstract
The cost savings and novel features associated with
Voice over IP (VoIP) are driving its adoption by service providers.
Such a transition however can successfully happen only if the
quality and reliability offered is comparable to the existing
PSTN. Unfortunately, the Internet's best effort service model
provides no inherent quality of service guarantees. Because
low latency and jitter is the key requirement for supporting
high quality interactive conversations, VoIP applications use
UDP to transfer data, thereby subjecting themselves to quality
degradations caused by packet loss and network failures.
In this paper we describe an architecture to improve the
performance of such VoIP applications. Two protocols are used
for localized packet loss recovery and rapid rerouting in the event
of network failures. The protocols are deployed on the routers of
an application-level overlay network and require no changes to
the underlying infrastructure. Experimental results indicate that
the architecture and protocols can be combined to yield voice
quality on par with the PSTN.
Secure Spread: An Integrated Architecture for Secure Group Communication
Yair Amir,
Cristina Nita-Rotaru,
Jonathan Stanton,
Gene Tsudik
In IEEE Transactions on Dependable and Secure Computing (TDSC), vol. 2, no. 3, pages 248-261, September 2005.
Abstract
Group communication systems are high-availability
distributed systems providing reliable and ordered message
delivery as well as a membership service to group-oriented
applications. Many such systems are built using a distributed
client-server architecture where a relatively small set of
servers -sharing information about the groups in the system - provide
service to numerous clients.
In this work, we show how group communication systems can
be enhanced with security services without sacrificing robustness
and performance. More specifically, we propose several inte-
grated security architectures for distributed client-server group
communication systems. In an integrated architecture, security
services are implemented in servers, in contrast to a layered
architecture where the same services are implemented in clients.
We discuss performance and accompanying trust issues of each
proposed architecture and present experimental results that
demonstrate the superior scalability of an integrated architecture.
1-800-OVERLAYS: Using Overlay Networks to Improve VoIP Quality
Yair Amir,
Claudiu Danilov,
Stuart Goose,
David Hedqvist,
Andreas Terzis
In the Proceedings of the 15th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 05) pp. 51-56, Skamania, Washington, June 13th-14th, 2005.
Abstract
The cost savings and novel features associated with Voice over IP
(VoIP) are driving its adoption by service providers. Such a transition
however can successfully happen only if the quality and reliability
offered is comparable to the existing PSTN. Unfortunately,
the Internet's best effort service model provides no inherent quality
of service guarantees. Because low latency and jitter is the
key requirement for supporting high quality interactive conversations,
VoIP applications use UDP to transfer data, thereby subjecting
themselves to performance degradations caused by packet loss
and network failures.
In this paper we describe two algorithms to improve the performance
of such VoIP applications. These mechanisms are used for
localized packet loss recovery and rapid rerouting in the event of
network failures. The algorithms are deployed on the routers of
an application-level overlay network and require no changes to the
underlying infrastructure. Initial experimental results indicate that
these two approaches can be composed to yield voice quality on
par with the PSTN.
2004
On the Performance of Group Key Agreement Protocols
Yair Amir,
Yongdae Kim,
Cristina Nita-Rotaru,
John Schultz,
Jonathan Stanton,
Gene Tsudik
In ACM Transactions on Information and Systems Security (TISSEC), vol. 7, no. 3, pages 1-32, August 2004.
Abstract
Group key agreement is a fundamental building block for secure peer group communication systems.
Several group key management techniques were proposed in the last decade, all assuming
the existence of an underlying group communication infrastructure to provide reliable and ordered
message delivery as well as group membership information. Despite analysis, implementation and
deployment of some of these techniques, the actual costs associated with group key management
have been poorly understood so far. This resulted in an undesirable tendency: on the one hand,
adopting sub-optimal security for reliable group communication, while, on the other hand, constructing
excessively costly group key management protocols.
This paper presents a thorough performance evaluation of five notable distributed key management
techniques (for collaborative peer groups) integrated with a reliable group communication
system. An in-depth comparison and analysis of the five techniques is presented based on experimental
results obtained in actual local- and wide-area networks. The extensive performance
measurement experiments conducted for all methods offer insights into their scalability and practicality.
Furthermore, our analysis of the experimental results highlights several observations which
are not obvious from the theoretical analysis.
1-800-OVERLAYS: Using Overlay Networks to Improve VoIP Quality
Yair Amir,
Claudiu Danilov,
Stuart Goose,
David Hedqvist,
Andreas Terzis
Technical Report CNDS-2004-2.
Abstract
Although telephony subscribers are accustomed to the
consistent voice quality and high reliability
of the traditional PSTN, the promise of a single converged
IP network to carry voice and data - and the cost savings therein -
motivates the interest to adopt voice-over-IP
(VoIP) technologies. However, the Internet provides best
effort delivery, without any inherent quality of service
guarantees. Low latency is a key factor in supporting high
quality interactive conversations, and as such contemporary
VoIP solutions use UDP to transfer data over the IP
layer, despite being subject to network loss and failures.
This paper describes the use of an overlay network
through which streams of voice packets are transmitted.
Flexible application level overlay routers can understand
the stringent requirements of VoIP and implement new
algorithms that mask the limitations of the underlying Internet.
We describe two protocols that facilitate localized
recovery for lost packets and rapid rerouting in the event of
network failures. Experimental results indicate that these
two approaches can be combined to yield a quantitative
improvement to voice communication quality.
Secure Group Communication Using Robust Contributory Key Agreement
Yair Amir,
Yongdae Kim,
Cristina Nita-Rotaru,
John Schultz,
Jonathan Stanton,
Gene Tsudik
In IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 15, no. 5, pages 468-480, May 2004.
Abstract
Contributory group key agreement protocols generate group keys based on contributions of all group
members. Particularly appropriate for relatively small collaborative peer groups, these protocols are
resilient to many types of attacks. Unlike most group key distribution protocols, contributory group key
agreement protocols offer strong security properties, such as key independence and perfect forward secrecy.
This paper presents the first robust contributory key agreement protocol resilient to any sequence
of group changes. The protocol, based on the Group Diffie-Hellman contributory key agreement, uses
the services of a group communication system supporting Virtual Synchrony semantics. We prove that
it provides both Virtual Synchrony and the security properties of Group Diffie-Hellman, in the presence
of any sequence of (potentially cascading) node failures, recoveries, network partitions and heals.
We implemented a secure group communication service, Secure Spread, based on our robust key
agreement protocol and Spread group communication system. To illustrate its practicality, we compare
the costs of establishing a secure group with the proposed protocol and a protocol based on centralized
group key management, adapted to offer equivalent security properties.
The Spread Toolkit: Architecture and Performance
Yair Amir,
Claudiu Danilov,
Michal Miskin-Amir,
John Schultz,
Jonathan Stanton
Technical Report CNDS-2004-1
Abstract
The Spread toolkit is a group communication system available from
www.spread.org. Spread provides a range
of reliability, ordering and stability guarantees for message delivery.
Spread supports a rich fault model that includes process crashes and
recoveries and network partitions and merges under the extended
virtual synchrony semantics. The standard virtual synchrony semantics
is also supported.
2003
On the Performance of Consistent Wide-Area Database Replication
Yair Amir,
Claudiu Danilov,
Michal Miskin-Amir,
Jonathan Stanton,
and
Ciprian Tutu,
Technical Report CNDS-2003-3, December 2003. Obsoletes TR CNDS-2002-4.
Abstract
In this paper we design a generic, consistent replication architecture that
enables transparent database replication and we present the optimizations
and tradeoffs of the chosen design. We demonstrate the practicality of
our approach by building a prototype that replicates a PostgreSQL
database system. We provide experimental results for consistent wide-area
database replication. We claim that the use of an optimized synchronization
engine is the key to building a practical synchronous replication system
for wide-area network settings.
A New Look at the Old Domain Name System
Yair Amir,
Daniel Massey,
and
Ciprian Tutu.
Technical Report CNDS-2003-2
Abstract
The Domain Name System (DNS) is undergoing
fundamental changes in both design and operations,
but these changes are mostly taking place in piecemeal
extensions. In this paper we consider how to maintain
a simple and robust DNS in the face of these inevitable
and essential changes. We consider some of the features
that the modern DNS is trying to incorporate and
we look at the ensuing problems from a systematic perspective.
We identify some key architectural issues and
design principles that we believe are essential to the
successful integration of such features in the DNS infrastructure.
Following these principles, we sketch two
possible mechanisms that would improve the availability
and timeliness, and decouple the zone management
from the query/response data-path of DNS while being
deployable in parallel and incrementally with respect
to the existing infrastructure.
N-Way Fail-Over Infrastructure for Reliable Servers and Routers
Yair Amir,
Ryan Caudy,
Ashima Munjal,
Theo Schlossnagle,
and
Ciprian Tutu.
In the Proceedings of the IEEE International Conference on
Dependable Systems and Networks (DSN03), San Francisco, June 2003.
(Earlier Version as TR CNDS-2002-5)
Abstract
Maintaining the availability of critical servers and
routers is an important concern for many organizations.
At the lowest level, IP addresses represent the global
namespace by which services are accessible on the Internet.
We introduce Wackamole, a completely distributed
software solution based on a provably correct algorithm
that negotiates the assignment of IP addresses among the
currently available servers upon detection of faults. This
reallocation ensures that at any given time any public IP
address of the server cluster is covered exactly once, as
long as at least one physical server survives the network
fault. The same technique is extended to support highly
available routers.
The paper presents the design considerations, algorithm
specification and correctness proof, discusses the
practical usage for server clusters and for routers, and
evaluates the performance of the system.
Reliable Communication in Overlay Networks
Yair Amir and
Claudiu Danilov.
In the Proceedings of the IEEE International Conference on
Dependable Systems and Networks (DSN03), San Francisco, June 2003.
An earlier version
of the paper was released as Technical Report CNDS-2003-1
Abstract
Reliable point-to-point communication is usually
achieved in overlay networks by applying TCP on the end
nodes of a connection. This paper presents a hop-by-hop
reliability approach that considerably reduces the latency
and jitter of reliable connections. Our approach is feasible
and beneficial in overlay networks that do not have the
scalability and interoperability requirements of the global
Internet.
The effects of the hop-by-hop reliability approach are
quantified in simulation as well as in practice using a newly
developed overlay network system that is fair with the external
traffic on the Internet. The experimental results show
that the overhead associated with overlay network processing
at the application level does not play an important factor
compared with the considerable gain of the approach.
Scaling Secure Group Communication: Beyond Peer-to-Peer
Yair Amir,
Cristina Nita-Rotaru,
Jonathan Stanton,
and
Gene Tsudik.
Int the Proceedings of DISCEX3, Washington DC, April 22-24, 2003.
Obsoletes TR CNDS-2002-3.
Abstract
This paper proposes several integrated security architecture
designs for client-server group communication systems. In an
integrated architecture, security services are implemented in
servers, in contrast to a layered architecture where the same services
are implemented in clients. We discuss the performance
and accompanying trust issues of each proposed architecture and
present experimental results that demonstrate the superior scalability
of an integrated architecture.
2002
On the Performance of Wide-Area Synchronous Database Replication
Yair Amir,
Claudiu Danilov,
Michal Miskin-Amir,
Jonathan Stanton,
and
Ciprian Tutu,
Technical Report CNDS-2002-4, November 2002.
Abstract
A fundamental challenge in database replication is to maintain a low cost
of updates while assuring global system consistency. The divculty of the
problem is magnified for wide-area network settings due to the high latency
and the increased likelihood of network partitions. As a consequence,
most of the research in the area has focused either on improving the
performance of local transaction execution or on replication models
with weaker semantics, which rely on application knowledge to resolve
potential con icts. In this work we identify the performance bottleneck
of the existing synchronous replication schemes as residing in the
update synchronization algorithm. We compare the performance of several
such synchronization algorithms and highlight the large performance gap
between various methods. We design a generic, synchronous replication
scheme that uses an enhanced synchronization algorithm and demonstrate
its practicality by building a prototype that replicates a PostgreSQL
database system. We claim that the use of an optimized synchronization
engine is the key to building a practical synhronous replication system
for wide-area network settings.
An On-Demand Secure Routing Protocol Resilient to Byzantine Failures
Baruch Awerbuch,
Dave Holmer,
Cristina Nita-Rotaru,
and
Herbert Rubens.
In ACM Workshop on Wireless Security (WiSe),
Atlanta, Georgia, September 28 2002.
Abstract
An ad hoc wireless network is an autonomous self-organizing
system of mobile nodes connected by wireless links where
nodes not in direct range can communicate via intermediate
nodes. A common technique used in routing protocols for ad hoc wireless networks is to establish the
routing paths on demand, as opposed to continually maintaining a complete
routing table. A significant concern in routing is the ability to function in the presence of
byzantine failures which include nodes that drop, modify, or mis-route packets in an
attempt to disrupt the routing service.
We propose an on-demand routing protocol for ad hoc wireless networks that provides
resilience to byzantine failures caused by individual or colluding nodes. Our adaptive
probing technique detects a malicious link after log n faults
have occurred, where n is the length of the path. These
links are then avoided by multiplicatively increasing their
weights and by using an on-demand route discovery protocol that finds a least weight path to the destination.
From Total Order to Database Replication
Yair Amir
and
Ciprian Tutu.
In the Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems (ICDCS),
Vienna, Austria, July 2002.
Abstract
This paper presents in detail an efficient and provably
correct algorithm for database replication over partitionable networks.
Our algorithm avoids the need for end-to-end acknowledgments for each action while supporting
network partitions and merges and allowing dynamic instantiation of new replicas.
One round of end-to-end acknowledgments is required only upon a membership change event
such as a network partition. New actions may be introduced to the system at any point,
not only while in a primary component. We show how performance can be further improved for
applications that allow relaxation of consistency requirements. We provide experimental results that
demonstrate the efficiency of our approach.
High Performance, Robust, Secure and Transparent Overlay Network Service
Yair Amir,
Claudiu Danilov,
and
Cristina Nita-Rotaru.
In FuDiCo 2002: International Workshop on Future Directions in
Distributed Computing , Bertinoro(Forli), Italy, June 3-7 2002.
Abstract
We propose a secure messaging infrastructure for unicast and multicast with near-optimal
performance and stronger semantics, beyond what is naturally achieved over the Internet, and
completely transparent to the application. This infrastructure is constructed by long-lived
daemons, running as user-space programs. The daemons dynamically create and maintain a logical
overlay network. The algorithms on this overlay network are not bounded by the standard
Internet protocols. Rather, they exploit the more limited scalability of the overlay network
(as opposed to the global Internet) to optimize performance (e.g. latency, throughput). Our
infrastructure can still scale to much higher numbers than, for example, group communication
systems, as we maintain much weaker global guarantees.
Global Flow Control for Wide Area Overlay Networks: A Cost-Benefit Approach
Yair Amir,
Baruch Awerbuch,
Claudiu Danilov,
and
Jonathan Stanton.
In the Proceedings of
IEEE Open Architectures and Network Programming (Openarch),
pp 155-166, New York, New York, June 2002.
Abstract
This paper presents a flow control protocol for
multi-sender multi-group multicast and unicast in wide area
overlay networks. The protocol is analytically grounded and
achieves real world goals, such as simplicity, fairness and
minimal resource usage. Flows are regulated based on the
"opportunity" costs of network resources used and the benefit provided by the flow.
In contrast to existing window-based flow control schemes, we avoid end-to-end per sender
or per group feedback by looking only at the state of the
virtual links between participating nodes. This produces
control traffic proportional only to the number of overlay
network links and independent of the number of groups,
senders or receivers. We show the effectiveness of the resulting protocol through simulations and
validate the simulations with live Internet experiments.
Maintaining Database Consistency in Peer to Peer Networks
Baruch Awerbuch,
and
Ciprian Tutu.
Technical Report CNDS-2002-2, February 2002.
Abstract
We present an algorithm for persistent consistent distributed commit (distributed database
commit) in a dynamic, asynchronous, peer to peer network. The algorithm has constant overhead
in time and space and almost constant communication complexity, allowing it to scale to very large
size networks. Previous solutions required linear overhead in communication and space, making
them unscalable.
We introduce a modular solution based on several well defined blocks with clear formal
specifications. These blocks can be implemented in a variety of ways and we give examples of possible
implementations. Most of the existing solutions require acknowledgments from every participant
for each action. Our algorithm is highly efficient by aggregating these acknowledgments. Also,
in contrast with existing solutions, our algorithm does not require any membership knowledge.
Components are detected based on local information and the information is disseminated on an
overlay spanning tree.
The algorithm may prove to be more suited for practical implementation than the existing
ones, because of its simplicity.
Practical Wide Area Database Replication
Yair Amir,
Claudiu Danilov,
Michal Miskin-Amir,
Jonathan Stanton
and
Ciprian Tutu.
Technical Report CNDS-2002-1, February 2002.
Abstract
This paper explores the architecture, implementation and performance of a wide and
local area database replication system. The architecture provides peer replication,
supporting diverse application semantics, based on a group communication paradigm.
Network partitions and merges, computer crashes and recoveries, and message omissions
are all handled. Using a generic replication engine and the Spread group communication
toolkit, we provide replication services for the PostgreSQL database system. We define
three different environments to be used as test-beds: a local area cluster, a wide area
network that spans the U.S.A, and an emulated wide area test bed. We conduct an
extensive set of experiments on these environments, varying the number of replicas and
clients, the mix of updates and queries, and the network latency. Our results show that
sophisticated algorithms and careful distributed systems design can make symmetric,
synchronous, peer database replication a reality for both local and wide area networks.
2001
From Total Order to Database Replication
Yair Amir
and
Ciprian Tutu.
Technical Report CNDS-2001-6, November 2001.
Abstract
This paper presents in detail an efficient and provably correct algorithm for database replication
over partitionable networks. Our algorithm avoids the need for end-to-end acknowledgments for
each action while supporting network partitions and merges and allowing dynamic instantiation
of new replicas. One round of end-to-end acknowledgments is required only upon a membership
change event such as a network partition. New actions may be introduced to the system at any
point, not only while in a primary component. We show how performance can be further improved
for applications that allow relaxation of consistency requirements. We provide experimental results
that demonstrate the superiority of this approach.
On the Performance of Group Key Agreement Protocols
Yair Amir,
Yongdae Kim,
Cristina Nita-Rotaru,
and
Gene Tsudik,
Technical Report CNDS-2001-5, November 2001, (Obsoletes TR CNDS-2001-4).
In the IEEE International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, July 2002,
short paper.
Abstract
Group key agreement is a fundamental building block for
secure peer group communication systems. Several group
key agreement protocols were proposed in the last decade,
all of them assuming the existence of an underlying group
communication infrastructure.
This paper presents a performance evaluation of five notable key agreement
protocols for peer groups, integrated
with a reliable group communication system (Spread).
They are: Centralized Group Key Distribution (CKD),
Burmester-Desmedt (BD), Steer et al. (STR), Group Diffie-Hellman (GDH)
and Tree-Based Group Diffie-Hellman
(TGDH). The paper includes an in-depth comparison and
analysis of conceptual results and is the first to report practical results in real-life local
and wide area networks. Our analysis of these protocols' experimental results offers
insights into their scalability and practicality.
Framework for Authentication and Access Control of Client-Server Group Communication Systems
Yair Amir,
Cristina Nita-Rotaru,
and
Jonathan Stanton.
In the Proceedings of the Third International Workshop on Networked
Group Communication (NGC), London UK, 7-9 November 2001.
Abstract
Researchers have made much progress in designing secure and scalable protocols
to provide specific security services, such as data secrecy, data integrity,
entity authentication and access control, to multicast and group applications.
However, less emphasis has been put on how to integrate security protocols with
modern, highly efficient group communication systems and what issues arise in
secure group communication systems. In this paper, we present a flexible and
modular architecture for integrating many different authentication and access
control policies and protocols with an existing group communication system,
while allowing applications to provide their own protocols and control the
policies. This architecture maintains, as much as possible, the scalability
and performance characteristics of the unsecure system. We discuss some of the
challenges when designing such a framework and show its implementation in
the Spread wide-area group communication toolkit.
Global Flow Control for Wide Area Overlay Networks: A Cost-Benefit Approach
Yair Amir,
Baruch Awerbuch,
Claudiu Danilov,
and
Jonathan Stanton,
Technical Report CNDS-2001-3. Accepted to IEEE Open Architectures and Network Programming (OpenArch), June 2002.
Abstract
This paper presents a flow control protocol for
multi-sender multi-group multicast and unicast in wide area
overlay networks. The protocol is analytically grounded and
achieves real world goals, such as simplicity, fairness and
minimal resource usage. Flows are regulated based on the
"opportunity" costs of network resources used and the benefit provided by the flow.
In contrast to existing window-based flow control schemes, we avoid end-to-end per sender
or per group feedback by looking only at the state of the
virtual links between participating nodes. This produces
control traffic proportional only to the number of overlay
network links and independent of the number of groups,
senders or receivers. We show the effectiveness of the
resulting protocol through simulations and validate the simulations
with live Internet experiments.
Partitionable Virtual Synchrony Using Extended Virtual Synchrony
John Schultz
Masters Thesis, January 2001
Abstract
View-oriented group communication systems(GCSs) are powerful tools for building
distributed applications. Over the past fifteen years, group communication researchers
developed a multitude of group communication semantics and implementations. Today,
researchers commonly design their group communication algorithms on top of simple
existings ervices such as a network membership service or a reliable FIFO multicast
framework. A natural extension of this idea is to implement one set of group
communication semantics using another. This approach is not usually utilized due to the
expensive overhead of running one set of group communication algorithms on top of
another.
This thesis argues that the Extended Virtual Synchrony (EVS) model of group
communication, implemented using a client-daemon architecture,is of such high
performance that the overhead of constructing another group communication model on
top of it is acceptable. It demonstrates that the strong safety properties provided by the
EVS model can be leveraged to create very simple algorithms that implement more
powerful group communication models.
This thesis presents several EVS algorithms for implementing a partitionable Virtual
Synchrony (VS)model of group communication. It first explicitly defines the VS and
EVS models through the presentation of their safety and liveness properties. Then,one
simple algorithm is formally proved to implement the VS model by utilizing the safety
and liveness properties of the underlying EVS system. Finally,the paper discusses
several other simple variants and algorithms that were developed during the course of this
work.
Framework for Authentication and Access Control of Client-Server Group Communication Systems
Yair Amir,
Cristina Nita-Rotaru,
and
Jonathan Stanton.
Technical Report CNDS-2001-2.
Abstract
Researchers have made much progress in designing secure and scalable protocols to provide
specific security services, such as data secrecy, data integrity, entity authentication and access control,
to multicast and group applications. However, less emphasis has been put on how to integrate security protocols
with modern, highly efficient group communication systems and what issues arise in secure group communication systems. In this paper, we present a flexible and
modular architecture for integrating many different authentication and access control policies and protocols with an existing group communication system, while
allowing applications to provide their own protocols and control the policies. This architecture maintains, as much as possible, the scalability and performance
characteristics of the unsecure system. We discuss some of the challenges when designing such a framework and show its implementation in the Spread wide-area group
communication toolkit.
Exploring Robustness in Group Key Agreement
Yair Amir,
Yongdae Kim,
Cristina Nita-Rotaru,
John Schultz,
Jonathan Stanton,
and
Gene Tsudik
In the Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), Phoenix, Arizona, April 2001, pp 399-408. Nominated for Best Paper Award.
Abstract
Secure group communication is crucial for building distributed applications
that work in dynamic environments and communicate over unsecured networks
(e.g. the Internet). Key agreement is a critical part of providing security
services for group communication systems. Most of the current contributory key agreement
protocols are not designed to tolerate failures and membership changes during
execution. In particular, nested or cascaded group membership
events (such as partitions) are not accommodated.
In this paper we present the first robust contributory
key agreement protocols resilient to any sequence of events
while preserving the group communication membership
and ordering guarantees.
Flow Control for Many-to-Many Multicast: A Cost-Benefit Approach
Yair Amir,
Baruch Awerbuch,
Claudiu Danilov,
and
Jonathan Stanton,
Technical Report CNDS-2001-1.
Abstract
Flow control, especially in multicast networks, is a problem of significant theoretical and practical
interest. We present a protocol that is analytically grounded, yet also achieves real world goals,
such as simplicity, fairness and minimal resource usage. We base our flow control protocol on the
Cost-Benefit algorithmic framework for resource management. We base decisions on the "opportunity" costs
of network resources, comparing the cost of each individual resource to the benefit it
provides. As opposed to existing window-based flow control schemes, we avoid end-to-end feedback
by basing decisions on the state of the links between participating nodes. This produces control traffic
proportional only to the number of overlay network links and independent of the number of groups.
We apply this algorithm to an existing wide area group communication system, however, it can
be applied to other multicast services as well. We show the effectiveness of the resulting protocol
through simulations and live Internet experiments.
2000
HYPERDOG - Up to Date Web Monitoring Through Metacomputers
Jacob Green
Master Thesis, Oct. 2000
Abstract
Search engines bring order to the content on the World Wide Web (WWW).
They are the primary interface for Internet users to find information. They currently take
30 days or more to find the changes that occur on the WWW due to the enormous amount
of data that must be crawled to find these changes. Yet the WWW is experiencing an
explosive growth of content, which will either greatly increase the cost of maintaining
search engines, or the rate that changes are found will progressively get worse.
HyperDog is a proposed solution to this problem. It uses a large Internet-wide
metacomputer to track the changes on the WWW. This system filters out only the
changes that occur and feeds them back to the search engines and other indexers.
Because the percentage of new and changed pages per day is relatively small, HyperDog
represents a significant bandwidth improvement. When search engines only have to
manage the changes on the WWW, they can scale better with the growth of the WWW.
HyperDog represents a key application to take advantage of the new global
metacomputers. By distributing the crawlers to many points in the network through these
metacomputers, the crawling algorithm gains some advantages. The crawlers each scan
a small part of the WWW, perform the necessary processing to determine if the pages
have changed, and only returns back the small portion of the WWW that is new or has
changed. Together the crawlers perform the majority of work and provide the search
engines with a much smaller condensed form of the information they need.
This paper also examines the properties of these metacomputers in an attempt to
define these properties and encourage the development of other key constructive
applications.
A Cost-Benefit Approach to Resource Allocation in Scalable Metacomputers
R. Sean Borgstrom
PhD Thesis, Sep. 2000
Abstract
A metacomputer is a set of machines networked together for increased computational
performance. To build an efficient metacomputer, one must assign jobs to the various
networked machines intelligently. A poor job assignment strategy can result in heavily
unbalanced loads and thrashing machines. This cripples the cluster's computational
power. A strong job assignment strategy helps a metacomputer complete all of its jobs
swift ly.
Resource heterogeneity makes job assignment more complex. Placing a job on one
machine might risk depleting its small memory. Another machine might have more free
memory but a heavily burdened CPU. Bin packing on memory protects the system
against thrashing. Load balancing protects the system against high CPU loads.
Combining the two approaches, however, gives an ad hoc heuristic algorithm with no
clear theoretical merit.
The Cost-Benefit Framework, developed in this work, offers a new approach to job
assignment on metacomputers. It smoothly handles heterogeneous resources by
convert ing them into a unitless cost. We assign (and possibly reassign) jobs to greedily
minimize this cost.
This approach gives us an online strategy provably competitive with the optimal
offline algorithm in the maximum usage of each resource. It has a weak competitive ratio
logarithmic in the number of machines in the cluster but even this weak ratio is
unprecedented in the literature. No other known method offers any competitive guarantee
on more than one resource.
We present experimental evidence that this strategy performs extremely well in
practice, comparing it to two important benchmarks: the default round robin strategy of
the popular PVM metacomputing system, and the powerful adaptive strategy of the
Mosix system.
Metacomput ing environments are not homogeneous. In some environments, the
scheduler has a great deal of information about submitted jobs. In other cases, it has very
little. Some systems can migrate jobs without interrupting their execution. Others cannot.
We develop variants of the basic "opportunity cost" strategy of the Cost-Benefit
Framework for various metacomputing environments, and prove all of them highly
efficient.
Finally, we provide two metacomputing systems a prototype and a complete system
based on these ideas. The Java Market prototype is a metacomputer built atop Java and
web technologies, able to integrate any consenting Internet-connected machine. The
Frugal System transforms any Jini network into a metacomputer.
This work was carried out under the supervision of Professor Yair Amir.
Exploring Robusteness in Group Key Agreement
Yair Amir,
Yongdae Kim,
Cristina Nita-Rotaru,
John Schultz,
Jonathan Stanton,
and
Gene Tsudik
Technical Report CNDS-2000-4, August 2000.
Abstract
Secure group communication is crucial for building distributed applications
that work in dynamic environments and communicate over unsecured networks
(e.g. the Internet). Key agreement is a critical part of providing security
services for group communication systems. Most of the current contributory key agreement
protocols are not designed to tolerate failures and membership changes during
execution. In particular, nested or cascaded group membership
events (such as partitions) are not accommodated.
In this paper we present the first robust contributory
key agreement protocols resilient to any sequence of events
while preserving the group communication membership
and ordering guarantees.
The Cost of Adding Security Services to Group Communication Systems
Cristina Nita-Rotaru
Technical Report CNDS-2000-3, March 2000.
Abstract
Numerous applications requiring information delivery from one sender to many receivers are based on a group
communication model. Group communication systems are used in industry and military systems where
reliability and high-availability are required. With the growth of the Internet, the number of applications that can
take advantage of a group communication infrastructure increased (teleconferences, white-boards, video
conferences, distributed interactive simulation, collaborative work). Over wide area networks the need for providing
confidentiality, integrity, and authenticity of messages is essential.
In this paper we present Secure Spread, a secure version of the Spread Toolkit. Secure Spread is a group
communication system that utilizes contributory group key management developed by the Cliques project and
Blowfish symmetric encryption algorithm. Its modular design allows drop-in replacement of encryption and/or
key agreement protocol. This work will not go to the details of a complete solution that handles every possible
combination of network events. Rather it will focus on the performance evaluation in the general case. The results
will give a good indication and insight of the overall cost of security in a group communication environment.
An Opportunity Cost Approach for Job Assignment and Reassignment in a Scalable Computing
Cluster
Yair Amir,
Baruch Awerbuch,
Amnon Barak,
Ryan S. Borgstrom
and
Arie Keren.
In the IEEE Transactions on Parallel and Distributed Systems, 11(7), pages 760-768, July 2000.
Abstract
A new method is presented for job assignment to and reassignment between machines in a
computing cluster. Our method is based on a theoretical framework that has been
experimentally tested and shown to be useful in practice. This "opportunity cost" method
converts the usage of several heterogeneous resources in a machine to a single homogeneous
"cost." Assignment and reassignment are then performed based on that cost. This is in
contrast to traditional, ad hoc methods for job assignment and reassignment. These treated
each resource as an independent entity with its own constraints, as there was no clean way to
balance one resource against another. Our method has been tested by simulations as
well as real executions and was found to perform well.
A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area
Group Communication
Yair Amir,
Claudiu Danilov,
and
Jonathan Stanton
Published in International Conference on Dependable Systems and Networks (FTCS-30, DCCA-8),
New York, New York, June 25-28, 2000.
A full version of this paper was published as Johns Hopkins University, Center for Networking
and Distributed Systems (CNDS) Technical report CNDS-99-2.
Abstract
Group communication systems are proven tools upon
which to build fault-tolerant systems. As the demands for
fault-tolerance increase and more applications require reliable distributed computing over
wide area networks, wide
area group communication systems are becoming very useful. However, building a wide area
group communication system is a challenge. This paper presents the design of
the transport protocols of the Spread wide area group communication system.
We focus on two aspects of the system.
First, the value of using overlay networks for application
level group communication services. Second, the requirements and design of effective low latency
link protocols used to construct wide area group communication. We support
our claims with the results of live experiments conducted
over the Internet.
Practical Cluster Applications of Group Communication
Yair Amir,
Yan Gu,
Theo Schlossnagle
and
Jonathan Stanton
Published in International Conference on Dependable Systems and Networks (FTCS-3
0, DCCA-8),New York, New York, June 25-28, 2000.
Abstract
Group communication systems are used to build fault-tolerant and replicated services.
However, they also can be
very useful for building tools that support a highly available web cluster, or help
manage a large collection of machines. This paper explores the features of one particular
Group communication system called SpreadTMthat are useful in this environment and describes two
specific applications that we built and actively use. These applications are a
distributed system logging tool and an IP based high availability fail-over tool for clusters.
A Cost-Benefit Framework for Online Management of a Metacomputing System
Yair Amir,
Baruch Awerbuch,
and
Ryan S. Borgstrom
The International Journal for Decision Support Systems, Elsevier Science, 28(1-2), pages 155-164, April 2000.
Abstract
Managing a large collection of networked machines, with a series of incoming jobs,
requires that the jobs be assigned to machines wisely. A new approach to this problem is
presented, inspired by economic principles: the Cost-Benefit Framework. This framework
simplifies complex assignment and performs well in practice. We demonstrate this framework in
the context of an Internet-wide market for computational services and verify its
utility for a classic network of workstations.
Secure Group Communication in Asynchronous Networks with Failures:
Integration and Experiments
Yair Amir,
Giuseppe Ateniese,
Damian Hasse,
Yongdae Kim,
Cristina Nita-Rotaru,
Theo Schlossnagle,
John Schultz,
Jonathan Stanton,
and
Gene Tsudik
Published in Proceedings of the 20th IEEE International Conference on
Distributed Computing Systems, Taipei, Taiwan, April 10-13, 2000, pp 330-343.
Originally published as Johns Hopkins University, Center for Networking
and Distributed Systems (CNDS) Technical Report CNDS-99-3.
Abstract
The increasing popularity and diversity of collaborative
applications prompts a need for highly secure and reliable
communication platforms for dynamic peer groups. Security mechanisms for such
groups tend to be both expensive
and complex and their integration with reliable group communication services
presents a formidable challenge.
This paper discusses some important integration issues,
reports on our implementation experience and provides experimental results.
Our approach utilizes distributed group
key management developed by the Cliques project. We enhance it to handle
processor and network faults (under a
fail-stop or crash-and-recover model) and asynchronous
membership events (such as joins, leaves, merges and network partitions).
Our approach leverages the strong properties provided by the Spread group communication system,
such as message ordering, clean failure semantics and a
membership service. The result of this work is a secure
group communications layer and an API that provide the
application programmer with both standard group communication services and flexible security services.
mod_backhand: A load balancing module for the Apache web server.
Theo Schlossnagle
Technical Report CNDS-2000-2.
Abstract
mod_backhand is a dynamically loadable module for the Apache web server developed at The Center for
Networks and Distributed Systems (CNDS) at The Johns Hopkins University. This module aims to
seamlessly allow for load balancing within a cluster of machines. This paper will discuss the concepts,
implementations, advantages/disadvantages, practical applications and future directions.
mod_backhand is a part of the Backhand project initiated at CNDS. The purpose of the Backhand
project is to develop tools for effective resource management and utilization. The purpose of
mod_backhand, specifically, is to provide the infrastructure to reallocate HTTP requests to any machine
within a cluster and the framework for effective and flexible decision making.
The concept behind mod_backhand is the amalgamation of several important deviations from
"standard" practice. Many network appliances used for balancing web clusters have their foundations in
the networking world. This leads to a design drawn in the image of a router. This approach was purposely
avoided in an attempt to avoid bottlenecks and single points of failure. The approach we use allows for
maximum utilization of a network's egress points and survives link failures extremely well. The flexibility
of the implementation provides a tool that can be used to build both a single and a multiple entry point
cluster.
The Backhand project was designed to tackle all aspects of resource allocation and management.
This was done for two reasons. First, the issues that must be tackled are far out of the scope of an Apache
module. Second, it allows for intelligent separation of concepts and functionality. mod_backhand, in its
current form, attempts to solve resource allocation and management issues within a cluster of machines on
a relatively low latency, high-throughput network.
The problems involved in finding the best web server or cluster for a specific client are better
tackled at a lower level. Specific approaches including optimizing algorithms based on DNS and IP routing
are not in the scope of mod_backhand, but fit well in the more general Backhand project.
Multicast routing through a network with hierarchical topology
aggregation
Baruch Awerbuch,
and
Tripurari Singh
Technical Report CNDS-2000-1.
Abstract
The PNNI is an umbrella protocol that allows considerable routing flexibility. While it
specifies the network signalling, it leaves route generation undefined. Any non-trivial
route generation requires the collection and dissemination of network metrics. Furthermore,
hierarchical networks require aggregation and compression of these metrics in order to be
scalable. Aggregation and compression methods are also left unspecified by the PNNI,
although the PNNI recommends using star graphs for aggregation. In contract, recent work
done by Awerbuch, Du, Khan and Shavitt [ADKS98] suggest that the minimum spanning tree
is a better choice. Their work also shows that exponential weight routing is superior to
(the ubiquitous) min-hop routing. In this paper, we extend the work of [ADKS98] to
multicast. We define a multicast protocol, and model it and incorporate it into their
simulator. Surprisingly, our results are not identical to that of [ADKS98]. However,
the recommended shcmes of [ADKS98] perform very well, and significantly out perform the
standard min-hop routing. This study further strengthens the case for including the minimum
spanning tree and exponential weight routing in the PNNI standards.
1999
An improved throughput-competitive algorithm for Online
Multicast
Tripurari Singh
Technical Report CNDS-99-5.
Abstract
We present an O(log^2nlogM) throughput competitive algorithm for online multicast, where n
is the number of nodes in the graph, and M is the number of multicast groups. This improves
upon the previous best O(logn + loglogM)(logn + logM)logn) competitive algorithm by Goel,
Henzinger and Plotkin, and closes the gap between the upper and lower bounds to O(logn).
This algorithm uses the recent extension of the maximal dense tree algorithm to graphs with
dynamic edge weights. Our algorithm is modular, and does not involve an integrated analysis
of its winner picking and routing components.
The maximal dense tree problem on a graph with dynamic edge weights
Tripurari Singh
Technical Report CNDS-99-4.
Abstract
In this paper we provide a polylogarithmic competitive online algorithm for the maximal dense
tree problem on a graph with dynamic edge weights. The previous work on the topic,
[AS97], only handled static edge weights. We expect that the inclusion of dynamic edge
weights will significantly improve the utility of our algorithm. Specifically we expect
simpler and better solutions to the important throughput competitive online multicast
problem.
Walrus - a Low Latency, High Throughput Web Service Using Internet-wide Replication
Yair Amir,
and
David Shaw
In Proceedings of the 19th IEEE ICDCS Workshop on Electronic Commerce and Web-based Applications,
pages 31-40, Austin, May 1999
A previous version is available as Technical Report CNDS-98-5 as
Abstract
Today, most of the popular web sites are served from
single locations. This basic Web client-server model
is easy to deploy and maintain and thus is very
successful. It suffers, however, from several efficiency
and availability problems. This paper presents
Walrus, a low-latency, high-throughput Web service
that addresses some of these problems. Under
Walrus, a single logical Web server can be replicated
to several clusters of identical servers where each
cluster resides in a different part of the Internet. An
important aspect of Walrus is its ability to
transparently direct the web browser to the best
replica without any changes to the web server, web
client, and network infrastructure. "Best" is a
relative term, dependent on where the client is
located on the network, the load on each replica, and
more. Walrus deploys an elegant algorithm that
balances these considerations.
1998
Seamlessly
Selecting the Best Copy from Internet-Wide Replicated Web Servers
Yair Amir,
Alec Peterson,
and
David Shaw
In the Proceedings of The 12th International Symposium on Distributed
Computing (DISC'98) (formerly WDAG), Andros, Greece, September 1998.
Also - Technical Report CNDS-98-3.
Abstract
The explosion of the web has led to a situation where a majority of the
traffic on the Internet is web related. Today, practically all of the popular web sites
are served from single locations. This necessitates frequent long distance network
transfers of data (potentially repeatedly) which results in a high response time for
users, and is wasteful of the available network bandwidth. Moreover, it commonly
creates a single point of failure between the web site and its Internet provider.
This paper presents a new approach to web replication, where each of the replicas
resides in a different part of the network, and the browser is automatically and
transparently directed to the "best" server. Implementing this architecture for popular
web sites will result in a better response-time and a higher availability of these sites.
Equally important, this architecture will potentially cut down a significant fraction of
the traffic on the Internet, freeing bandwidth for other uses.
WALRUS: A Low Latency, High Throughput Web Service Using Internet-wide Replication.
David Shaw
Masters Thesis, August 1998.
Abstract
Today, most of the popular web sites are served from
single locations. This basic Web client-server model
is easy to deploy and maintain and thus is very
successful. It suffers, however, from several efficiency
and availability problems. This paper presents
Walrus, a low-latency, high-throughput Web service
that addresses some of these problems. Under
Walrus, a single logical Web server can be replicated
to several clusters of identical servers where each
cluster resides in a different part of the Internet. An
important aspect of Walrus is its ability to
transparently direct the web browser to the best
replica without any changes to the web server, web
client, and network infrastructure. "Best" is a
relative term, dependent on where the client is
located on the network, the load on each replica, and
more. Walrus deploys an elegant algorithm that
balances these considerations.
The Spread Wide Area Group Communication System
Yair Amir
and
Jonathan Stanton
Technical Report CNDS-98-4.
Abstract
Building a wide area group communication system is a challenge. This paper presents the design and protocols
of the Spread wide area group communication system. Spread integrates two low-level protocols: one for local
area networks called Ring, and one for the wide area network connecting them, called Hop. Spread decouples the
dissemination and local reliability mechanisms from the global ordering and stability protocols.
This allows many optimizations useful for wide area network settings. Spread is operational and publicly
available on the Web.
A Cost-Benefit Framework for Online Management of a Metacomputing System
Yair Amir,
Baruch Awerbuch,
and
Ryan S. Borgstrom
Appeared in the 1st International Conference on
Information and Computation Economies (ICE-98), Charleston, October 1998.
Abstract
Managing a large collection of networked machines, with a series of incoming jobs,
requires that the jobs be assigned to machines wisely. A new approach to this problem is
presented, inspired by economic principles: the Cost-Benefit Framework. This framework
simplifies complex assignment and performs well in practice. We demonstrate this framework in
the context of an Internet-wide market for computational services and verify its
utility for a classic network of workstations.
An Opportunity
Cost Approach for Job Assignment and Reassignment in a Scalable Computing
Cluster
Yair Amir,
Baruch Awerbuch,
Amnon Barak,
Ryan S. Borgstrom
and
Arie Keren.
In the Proceedings of the 10th International Conference on Parallel
and Distributed Computing and Systems (PDCS'98), Las Vegas, October 1998.
Also - Technical Report CNDS-98-2.
Abstract
A new method is presented for job assignment to and
reassignment between machines in a computing cluster.
Our method is based on a theoretical framework that
has been experimentally tested and shown to be useful
in practice. This "opportunity cost" method converts the
usage of several heterogeneous resources in a machine
to a single homogeneous "cost". Assignment and
reassignment is then performed based on that cost. This
is in contrast to previous methods for job assignment
and reassignment, which treat each resource as an
independent entity with its own constraints. These
previous methods were intrinsically ad hoc, as there
was no clean way to balance one resource against
another.
The Java Market:
Transforming the Internet into a Metacomputer
Yair Amir,
Baruch Awerbuch,
and
Ryan S. Borgstrom
Technical Report CNDS-98-1.
Abstract
Most of the machines that are connected to the Internet are idle a significant fraction of the time.
This paper presents the Java Market, a system that allows organizations and Internet users to make use
of this wasted computational power.
Using the Java programming language and the Web technology, the Java Market is the first
metacomputing system that can seamlessly take advantage of machines of any architecture, anywhere
on the Internet. Every user on the Internet can contribute their machine's computational resources just
by pointing a Java-capable browser to the Java Market web page. Similarly, users can launch jobs to
the system by posting them on the Web and registering them with the Java Market.
In contrast, other systems that allow sharing of computational resources between machines over the
network require homogeneous system architecture. They often involve extensive installation or even
kernel-level modifications to the operating system.