Real-Time Byzantine Resilient Systems

This research is currently supported in part by the Department of Energy (DOE) Offices of Cybersecurity, Energy Security, and Emergency Response (CESER); Electricity (OE); and Nuclear Energy (NE) under the Grid Modernization Laboratory Consortium (GMLC) Topic 5.1.4 - Cyber-Physical Security.

Overview

    Critical infrastructure control systems are becoming more connected to the Internet for cost-effectiveness and scalability reasons, but this leaves them vulnerable to attack. Most systems today are not designed to withstand sophisticated attacks; an attacker who is able to compromise a single machine typically gains the power to take down the entire system. Consider the protective relays used to monitor and protect grid assests during risky events. A protective relay under the control of an attacker can damage the grid assets by not protecting when needed. A grid asset such as a high voltage (345kV and up) transformer serves vast spans of the grid, costs millions of dollars, and takes over a year to procure, damaging it threatens grid stability. Additionally, a relay that does unnecessarily trip to protect causes significant disruption to many customers.

    The rising number of cyberattacks against critical infrastructure reinforces the need to build Byzantine resilient power grid infrastructure. However, the critical applications have many rigid requirements even in presence of failures/attacks: strict real-time requirement, continuous availability, long system life and economic factors. We research to build Byzantine resilient system architectures, protocols and tools needed for critical infrastruture that can ensure correct operations in the face of successful intrusions and network attacks while meeting the required latency constraints.

    While Byzantine Fault Tolerance technique like State Machine Replication can address the requirements of the power grid control centers that have typical latency requirements of 100-200ms, they cannot address all the requirements of the substations. Critical protection functions like the one described above are time-critical and have a responsiveness requirement of quarter-power cycle due the physical properties of the substation system. In the U.S., a cycle is 16.667ms and a quarter-cycle is 4.167ms. We develop Spire for the Substation, the first real-time Byzantine resilient architecture and protocols for the substation as an open source system.

    The system uses proactive recovery and diversity to allow Byzantine resileint systems to survive an unbounded number of compromises over the system lifetime, as long as the number of simultaneous compromises does not exceed a certain threshold. A key component of this work is the practical use of diversity to support the standard assumption that all machines in a system are not compromised simultaneously. If all machines run exactly the same programs in exactly the same environment, the same exploit will be effective against all of them. Because of this, it is necessary to add diversity in order to build resilient systems. In addition to Byzantine resilience techniques, the system is reinforced with machine learning based intrusion-detection and awareness module.

    Spire for the Substation is part of the open-source Spire, the network-attack resilient intrusion-tolerant SCADA for the power grid. The system underwent a red team experiment by Sandia National Labs(SNL) that included protective relays from industry partners- Siemens, GE, and Hitachi Energy in 2022.

TEAM

Publications

  • Real-Time Byzantine Resilient Power Grid Infrastructure: Evaluation and Trade-offs
    Sahiti Bommareddy, Maher Khan, David J Sebastian Cardenas, Carl Miller, Christopher Bonebrake, Yair Amir, and Amy Babay
    Accepted at International Workshop on Explainability of Real-time Systems and their Analysis at the IEEE Real-Time Systems Symposium (RTSS 2022)
Distributed Systems and Networks Lab
Computer Science Department, Johns Hopkins University
Malone Hall
3400 North Charles Street
Baltimore, MD 21218