Prime: Byzantine Replication Under Attack
Overview
Prime is a Byzantine fault-tolerant replication engine that provides meaningful performance guarantees even after some of the replication servers have been compromised. Like previous Byzantine fault-tolerant replication protocols, Prime guarantees Safety (consistency of the correct replicas) and Liveness (the eventual execution of each update) as long as no more than f out of 3f+2k+1 replicas are compromised, no more than k replicas are unavailable (e.g. due to crashes, network partitions, or proactive recovery), and the network is sufficiently stable. Unlike previous protocols, Prime additionally provides a stronger performance guarantee, which we call Bounded-Delay. Bounded-Delay limits the amount of performance degradation that can be caused by malicious servers. Intuitively, Prime forces any leader that remains in power to meet a threshold level of performance, where the threshold is a function of the message delays between the correct servers in the system, which cannot be arbitrarily increased by the malicious servers.
Prime supports the use of diversity and proactive recovery to substantially increase the effort required for an attacker to compromise multiple replicas simultaneously. The MultiCompiler developed by the UC Irvine Secure Systems and Software Laboratory can be used to diversify the code layout of Prime servers in order to increase the resiliency of the system. The MultiCompiler uses a 64-bit random number to generate different variants of an application. In this way, if an adversary attacks all the servers in parallel, the probability to defeat more than f servers is low. Proactive recovery allows Prime servers to be periodically rejuvenated to clean potentially undetected intrusions from the system and can be combined with diversity such that a different version of the Prime server is generated after each rejuvenation.
By default, Prime is configured to make use of Spines, an overlay messaging framework developed at Johns Hopkins. Spines offers an intrusion-tolerant network service that protects communication between the Prime replicas. Spines can be deployed in both local-area and wide-area environments and includes tools for emulating wide-area topologies in local-area networks by placing bandwidth and latency constraints on the links between servers.
Note that version 3.0 re-designed Prime's replication model such that each replica in the system consists of an application replica paired with a Prime daemon co-located on the same machine. Prime daemons provide an ordering service for the application replicas, with each Prime daemon delivering a totally ordered stream of updates to its application replica. In this model, application-specific state transfer must be handled at the application level. When a Prime daemon detects that a state transfer is needed (i.e. because it has missed message, for example due to a crash, partition, or proactive recovery), it will notify the application, which can then decide on the correct action (different applications may vary on whether an application-level state transfer is needed).
Prime was created at Johns Hopkins University by Yair Amir, Jonathan Kirsch, John Lane, Marco Platania, Amy Babay, and Thomas Tantillo.
Special thanks to Brian Coan for major contributions to the design of the Prime algorithm, and Jeff Seibert for major contribution to the View Change protocol.
Software
Prime can be downloaded here. The code was written in C and runs on Linux.
Releases
- Version 3.3 - December 23, 2020
- Version 3.2 - November 26, 2018
- Version 3.1 - March 14, 2018
- Version 3.0 - May 17, 2017
- Version 2.0 - September 17, 2014
- Version 1.1 - December 07, 2013
- Version 1.0 - May 04, 2010
See the Changelog for release details.
Funding
Partial funding for Prime research was provided by the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation (NSF). Prime is not necessarily endorsed by DARPA or the NSF.
License
Prime may be freely used and distributed under some conditions. Please review the license agreement for more details.
Publications
- Network-Attack-Resilient
Intrusion-Tolerant SCADA
    Amy Babay, Thomas Tantillo, Trevor Aron, Marco Platania, Yair Amir
    Technical Report CNDS-2017-2. Accepted to IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2018),
    Luxembourg, June 2018.
  - Towards a
Practical Survivable Intrusion Tolerant Replication System (Technical
Report)
    Marco Platania, Daniel Obenshain, Tom Tantillo, Ricky Sharma, Yair Amir
    Technical Report CNDS-2014-1.
  - Towards
a Practical Survivable Intrusion Tolerant Replication
System
    Marco Platania, Daniel Obenshain, Tom Tantillo, Ricky Sharma, Yair Amir
    33rd IEEE/IFIP International Sysmposium on Reliable Distributed Systems (SRDS 2014), Nara, Japan, October 2014, pp. 242-252.
  - Survivable
SCADA via Intrusion-Tolerant Replication
    Jonathan Kirsch, Stuart Goose, Yair Amir, Dong Wei, Paul Skare
    IEEE Transactions on Smart Grid, January 2014, 5(1), pp. 60-70.
  - Prime: Byzantine Replication
Under Attack
    Yair Amir, Brian Coan, Jonathan Kirsch, John Lane
    IEEE Transactions on Dependable and Secure Computing (TDSC), 8(4), pp. 564-577, July 2011.
  - Byzantine
Replication Under Attack
    Yair Amir, Brian Coan, Jonathan Kirsch, John Lane
    38th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2008), Anchorage, Alaska, June 2008, pp. 197-206.