Real-Time Byzantine Resilient Systems
This research is currently supported in part by the Department of Energy (DOE) Offices of Cybersecurity, Energy Security, and Emergency Response (CESER); Electricity (OE); and Nuclear Energy (NE) under the Grid Modernization Laboratory Consortium (GMLC) Topic 5.1.4 - Cyber-Physical Security.
Overview
Critical infrastructure control systems are becoming more connected to the Internet for cost-effectiveness and scalability reasons, but this leaves them vulnerable to attack. Most systems today are not designed to withstand sophisticated attacks; an attacker who is able to compromise a single machine typically gains the power to take down the entire system. Consider the protective relays used to monitor and protect grid assests during risky events. A protective relay under the control of an attacker can damage the grid assets by not protecting when needed. A grid asset such as a high voltage (345kV and up) transformer serves vast spans of the grid, costs millions of dollars, and takes over a year to procure, damaging it threatens grid stability. Additionally, a relay that does unnecessarily trip to protect causes significant disruption to many customers.
The rising number of cyberattacks against critical infrastructure reinforces the need to build Byzantine resilient power grid infrastructure. However, the critical applications have many rigid requirements even in presence of failures/attacks: strict real-time requirement, continuous availability, long system life and economic factors. We research to build Byzantine resilient system architectures, protocols and tools needed for critical infrastruture that can ensure correct operations in the face of successful intrusions and network attacks while meeting the required latency constraints.
While Byzantine Fault Tolerance technique like State Machine Replication can address the requirements of the power grid control centers that have typical latency requirements of 100-200ms, they cannot address all the requirements of the substations. Critical protection functions like the one described above are time-critical and have a responsiveness requirement of quarter-power cycle due the physical properties of the substation system. In the U.S., a cycle is 16.667ms and a quarter-cycle is 4.167ms. We develop Spire for the Substation, the first real-time Byzantine resilient architecture and protocols for the substation as an open source system.
The system uses proactive recovery and diversity to allow Byzantine resileint systems to survive an unbounded number of compromises over the system lifetime, as long as the number of simultaneous compromises does not exceed a certain threshold. A key component of this work is the practical use of diversity to support the standard assumption that all machines in a system are not compromised simultaneously. If all machines run exactly the same programs in exactly the same environment, the same exploit will be effective against all of them. Because of this, it is necessary to add diversity in order to build resilient systems. In addition to Byzantine resilience techniques, the system is reinforced with machine learning based intrusion-detection and awareness module.
Spire for the Substation is part of the open-source Spire, the network-attack resilient intrusion-tolerant SCADA for the power grid. The system underwent a red team experiment by Sandia National Labs(SNL) that included protective relays from industry partners- Siemens, GE, and Hitachi Energy in 2022.
TEAM
-
Johns Hopkins Team
Publications
- Real-Time Byzantine Resilient Power Grid Infrastructure: Evaluation and Trade-offs
Sahiti Bommareddy, Maher Khan, David J Sebastian Cardenas, Carl Miller, Christopher Bonebrake, Yair Amir, and Amy Babay
Accepted at International Workshop on Explainability of Real-time Systems and their Analysis at the IEEE Real-Time Systems Symposium (RTSS 2022)
- Real-Time Byzantine Resilience for Power Grid Substations
Sahiti Bommareddy, Daniel Qian, Christopher Bonebrake, Paul Skare, Yair Amir
In Proceedings of the 41st International Symposium on Reliable Distributed Systems (SRDS 2022)