Toward Intrusion Tolerant Clouds

A DARPA/I2O grant (November 2011 - September 2016) to Johns Hopkins University, Purdue University and University of California at Irvine. A component of the DARPA Mission-Oriented Resilient Clouds (MRC) program. Principal Investigator: Yair Amir. Subcontract PIs: Cristina Nita-Rotaru, Michael Franz.

Overview

Cloud computing offers a new, cost-effective approach for running the nation's IT infrastructure. As critical services move to a relatively small number of large distributed systems, ensuring the availability, reliability, and security of those systems becomes essential. Our experience has shown that a scalable, highly-available cloud system requires consistent replicated global state and a distributed messaging system that connects the cloud components. However, there is a large gap between today's cloud systems and a truly resilient cloud architecture; this gap is the vulnerability to intrusions. The systems in use today were not designed to withstand sophisticated attackers who may successfully compromise one or more machines in the system. Modern cloud systems are generally composed of homogeneous hosts on the widely accessible Internet. Because these systems are on the Internet, they are subject to attack. However, today's systems typically rely on perimeter defenses and implicitly trust all hosts in the system; an attacker who is able to gain access to a single host can cause serious damage throughout the system. Moreover, the homogeneity of the hosts means that the same exploit will be effective against all of them, so even if hosts are not implicitly trusted, a determined attacker can compromise a large fraction of the system and do considerable damage. A resilient cloud must continue to function correctly and perform well under sophisticated attacks, including when the system is partially compromised. However, the algorithms and tools needed to build consistent global state and distributed messaging systems that meet this requirement do not exist in practice. Our goal in this project is to invent, develop, and transition the replication and messaging tools necessary to make public and private clouds resilient to sophisticated intrusion attacks. The proposed plan includes:

Developing the first scalable, intrusion-tolerant replication protocol that provides performance guarantees under attack, including extremely sophisticated intrusions.
Developing the first scalable tunable intrusion-tolerant overlay messaging engine, containing two new sets of protocols: Controlled Authenticated K-Paths Routing and Controlled Authenticated Flooding.
- Controlled Authenticated K-Paths Routing: routes a message from source to destination along K node-disjoint paths, providing protection against up to K-1 compromised nodes, while incurring a cost that is K times the cost of standard secure link-state routing.
- Controlled Authenticated Flooding: provides optimal intrusion tolerance -- if there is even a single path of correct nodes from source to destination, the message will arrive.
Developing automated diversity to be incorporated with the replication and messaging engines, providing each of their instances in the cloud network with a diverse attack surface.
Developing a detection, diagnosis and prediction engine that will use cloud entities' log streams to allow real time detection and prediction of faults and attacks.
Implementing the previously mentioned components into a real cloud system running on the wide-area network.

Students

Johns Hopkins University: Daniel Obenshain, Tom Tantillo, Amy Babay.
Purdue University: Andrew Newell, Jeff Seibert, Endadul Hoque, Sebastian Moreno.
University of California, Irvine: Andrei Homescu, Stephen Crane, Steven Neisius.

Results and Current Activities

We have designed a Controlled Authenticated K-Paths routing protocol, and two Controlled Authenticated Flooding protocols. The first flooding protocol is Priority Flooding with Source-Based Fairness, which is designed for cloud monitoring and ensures that the most important messages from each source reach their destination in a timely manner. The second is Reliable Flooding with Flow-Based Reserved Capacity, which is designed for control messages and ensures reliable message delivery.
We have implemented the K-Paths routing algorithm and the two flooding algorithms in the existing Spines framework.
We have tested the K-Paths routing algorithm and the flooding algorithms on both a simulated network and a real cloud overlay network.
We have developed a practical, survivable, intrusion-tolerant replication engine by integrating the Prime replication system, which provides performance guarantees under attack, with compiler-based diversification that allows each replica to run different version of the software. Moreover, replicas are periodically rejuvenated as part of a proactive recovery protocol, and each replica generates a new, diverse version of the software upon rejuvenation. The compiler-based diversification is provided by the MultiCompiler developed by Michael Franz's group at University of California, Irvine.
We have developed theoretical results concerning the placement of limited numbers of diverse variants (e.g. different operating systems) within a system.

Presentations

MRC Kickoff Meeting in Crystal City, VA, November 2011: Slides