SCADA: A big challenge for information security professionals
by Manuel Humberto Santander Pelaez (Version: 1)
One of the most interesting challenges of working as Chief Information Security Officer in a utility company is the variety of infrastructure types that supports the business process. I refer here to the infrastructure that supports real-time management systems for generation transmission and distribution of energy and the system that are responsible for coordinating the pumping of water to individual households and industries.
The implementation of a information security management system that includes this kind of critical infrastructure to the business processes provides a number of interesting challenges which are not covered in the conventional security model for IT processes:
- Information Security risks associated to the delivery process of energy and water utility services process, can lead to disruption of both services for a large number of people in a country. If errors in the handling of SCADA equipment have been responsible of cascading effects that collapse most of the electrical system of a country, what if someone is doing an identity theft in the energy SCADA system and performs tasks such as increasing the rotation of the generation turbines, increasing the energy flow exceeding the capacity of a transmission line or simply turning off the turbines of a power plant? Imagine the chaos that would plunge a country or region.
- What if in the water tanks of a city begins to overflow its maximum level and the pressure causes the pipes bursting in the streets? Imagine scenarios like the following in every city: http://www.youtube.com/watch?v=kbz_zxsJCfg&feature=related
- The cost of repairing damage of any of the above scenarios is enormous. If we add the inability of the company to generate money for generation, transmission and distribution of energy, how much time passes before the company cease to exist?
SCADA systems have a very particular operating environment. Because they are real-time systems, data monitoring and orders sent to the RTU should arrive in the shortest time possible, since an additional delay of even 10 ms can mean a massive blackout by activation of the protections of a substation. Similarly, suppliers of these systems tend to provide support on these only on a specific configuration, which is usually not too safe and lacks basic security controls such as security patches, data encryption, authentication and non default configurations.
The architecture for a SCADA system is as follows:
The components are:
- Remote Terminal Unit (RTU): The RTU is defined as a communication device within the SCADA system and is located at the remote substation. The RTU gathers data from field devices in memory until the MTU request that information. It also process orders from the SCADA like switch off a transmission line.
- Master Terminal Unit (MTU): The MTU is defined as the heart of a SCADA system and is located at the main monitoring center. MTU initiates communication with remote units and interfaces with the DAS and the HMI.
- Data Acquisition System (DAS): The DAS gathers information from the MTU, generates and store alerts that needs attention from the operator because it can cause impact on the system.
- Human Machine Interface (HMI): The HMI is defined as the interface where the operator logs on to monitor the variables of the system. It gathers information from the DAS.
Due to its criticality, SCADA operators are reluctant to implement any type of information security controls that can change the operating environment for the system. How to implement a security scheme that does not interfere with the functionality needed for the business process? We took the following items specified in the standards of North American Reliability Corp (NERC) Critical Infrastructure Protection (CIP) to implement controls for an Energy SCADA:
Project 2008-06 Cyber Security — Order 706
- CIP–002–2 — Critical Cyber Asset Identification
- CIP–003–2 — Security Management Controls
- CIP–004–2 — Personnel and Training
- CIP–005–2 — Electronic Security Perimeter(s)
- CIP–006-2a — Cyber Security — Physical Security
- CIP–007–2 — Systems Security Management
- CIP–008–2 — Incident Reporting and Response Planning
- CIP–009–2 — Recovery Plans for Critical Cyber Assets
For point number two, we took the same table to classify information assets for the corporate information security management system and applied it to the energy processes:
Consequence |
Value |
Criteria |
Catastrophic |
5 |
a) Generates loss of confidentiality of information that can be useful for individuals, competitors or other internal or external parties, with non-recoverable effect for the Company. |
b) Generates loss of integrity of information internally or externally with non-recoverable effect for the Company. |
||
c) Generates loss of availability of information with non-recoverable effect for the Company. |
||
Higher |
4 |
a) Generates loss of confidentiality of information that can be useful for individuals, competitors or other internal or external parties, with mitigated or recoverable effects in the long term. |
b) Generates loss of integrity of information internally or externally with mitigated or recoverable effects in the long term. |
||
c) Generates loss of availability of information with mitigated or recoverable effects in the long term. |
||
Moderate |
3 |
a) Generates loss of confidentiality of information that can be useful for individuals, competitors, or other internal or external parties, with mitigated or recoverable effects in the medium term. |
b) Generates loss of integrity of information internally or externally with mitigated or recoverable effects in the medium term. |
||
c) Generates loss of availability of information with mitigated or recoverable effects in the medium term. |
||
Minor |
2 |
a) Generates loss of confidentiality of information that can be useful for individuals, competitors, or other internal or external parties, with mitigated or recoverable effects in the short term. |
b) Generates loss of integrity of information internally or externally with mitigated or recoverable effects in the short term. |
||
c) Generates loss of availability of information with mitigated or recoverable effects in the short term. |
||
Insignificant |
1 |
a) Generates loss of confidentiality of information that is not useful for individuals, competitors or other internal or external parties. |
b) Generates loss of integrity of information internally or externally with no effects for the company |
||
c) Generates loss of availability of information with no effects for the company. |
From the previous table, we assigned controls to implement and ensure the security level for the asset. For point 3 and 4 we adopted all definitions from the corporate Information Security Management System. See all the required controls here: http://www.nerc.com/files/CIP-003-1.pdf and http://www.nerc.com/files/CIP-004-2.pdf.
The biggest issue here was authentication and clear-text traffic. Many devices from our SCADA system did not support authentication and also information was sent using cleartext protocols. Every time we tried to introduce a VPN or crypto level-2 devices, the network latency increased and functions of the system were degraded, which is why we had to remove those controls. When we asked our vendor for those controls as native functions for the system, we received a request to purchase the next version of the SCADA System.
The corporate antivirus didn't work because it consumed all the resources of the DAS and the HMI. Same happened with the Host IPS. The solution we found for the problem was SolidCore S3 product (http://www.solidcore.com/products/s3-control.html), as it was non-intrusive, did not add extra layers and virtual devices to the operating system and controlled very good the zero-day problems.
For configuration changes, we established a weekly maintenance schedule in which the service of the SCADA system would stop for three hours changing the operation mode to contingency, so the IT operators could perform screening for viruses, install security patches and modifying security baselines. If the change was not successful and the system is degraded, the changes were removed and tried again the following week. This was not an easy task, because the vendor would not support us and we had to learn a lot on how the system components worked.
For point 5, We tried to redraw the SCADA network so critical traffic would not mix with other type of traffic. For wireless devices, we managed to implement 802.1X authentication. We divided the SCADA network into the following perimeters:
Cisco Firewall Service Module inside Catalyst 6509 with VSS supervisors (VS-6509E-S720-10G) gave us the required bandwith and no disruptions were presented within the SCADA environment. It also have IPS (IDSM-2) that sends the alerts along with the log firewalls to our RSA envision correlator.
For point 6, all the place has armored doors, CCTV, biometric authentication and security guards patrolling around the physical perimeter.
Now we are able to manage the security controls inside the corporate IT network and the SCADA systems. I still know that I have many things to do to to achieve the other points of NERC, but still will be an interesting and challenging goal.
-- Manuel Humberto Santander Peláez | http://twitter.com/manuelsantander | http://manuel.santander.name | msantand at isc dot sans dot org
Anatomy of a PDF exploit
by Manuel Humberto Santander Pelaez (Version: 1)
Niels Provos has done an excellent blog post on how to exploit CVE-2010-0188: An integer overflow in the parsing of the dot range option in TIFF files. Find the adobe advisory here.
More information at http://www.provos.org/index.php?/archives/85-Anatomy-of-a-PDF-Exploit.html#extended
-- Manuel Humberto Santander Peláez | http://twitter.com/manuelsantander | http://manuel.santander.name | msantand at isc dot sans dot org
Failure of controls...Spanair crash caused by a Trojan
Several readers have pointed us to an article about the preliminary report of the Spanair flight that crashed on takeoff in 2008 killing 154. The article suggests that a Trojan infected a Spanair computer and this prevented the detection of a number of technical issues with the airplane. The article speculates that if these issues had been detected the plane would not have been permitted to attempt take off.
There is still a lot that is conjecture and unknowns at this point in the investigation and I will try not to add to the speculation, but it made me think about the parallels to information security.
In information security we often speak of controls. There are three types of controls; preventive, detective, and corrective. Predominantly in information security we deal with preventive and detective controls.
Preventive Controls aim at preventing issues before they occur. Some examples of preventive controls are policies, standards of operation, procedures, checklists, segregation of duties and change controls. From an IT technology point of view firewalls and intrusion prevention systems are popular technological preventive controls. The airline industry also has procedural and technological controls. Airlines have operating protocols covering most aspects of operations from when it is safe to fly to how to maintain the equipment. Pilots have pre-flight and in-flight checklists to ensure safe operation of the aircraft. Modern airliners have similar interlocks and safety systems to attempt to protect the aircraft from mechanical failure or human error.
Detective controls aim to detect an issue when it does occur, or as soon as possible after. In the words of Dr. Eric Cole, a notable SANS instructor, “Prevention is ideal, but detection is a must!” If at all possible we would like to prevent the event from occurring, but if we can’t prevent the event we want to know it happened so we can adequately respond. The obvious IT detective controls are host and network based intrusion detection systems (IDS). But less technological processes such as audits are also a detective control aimed to detect and correct anomalies before they become more serious. Modern airliners also have detective systems to detect events before they are service affecting. One quote from the article, indicates a failure in a detective control occurred ... “The plane took off with flaps and slats retracted, something that should in any case have ... triggered an internal warning on the plane.”
I am not a pilot, so I cannot speak with authority on how to fly a passenger airliner, but it seems clear to me that this accident was caused by the failure of a number of controls leading to a disastrous outcome. Clearly the SpanAir diagnostic system (a detective control) designed to detect anomalies in the airliners system failed, possibly due to a Trojan. Also it appears the pilots bypassed part of their pre-takeoff checklist, leaving the flaps and slats in a position not recommended for takeoff. As ISC reader Frank pointed out that is most likely because the pilots had aborted the initial attempt to takeoff and most likely resumed the pre-takeoff checklist (a preventive control) too low in the checklist and missed a significant step. It is also clear that for some reason an internal system (a detective control) that should have detected the misconfigured flaps and slats for some reason did not alert the pilots to this condition.
In information security, the stakes are rarely so high as human lives, but failures in controls often lead to unexpected consequences. A misconfigured firewall rule allowing more permissive access to systems, a false negative in an IDS/IPS system, a user violating policy by plugging in a personal USB stick etc. The moral of the story is don’t take your control systems and processes for granted. Audit and test them regularly to ensure they are operating correctly.
-- Rick Wanner - rwanner at isc dot sans dot org - http://rwanner.blogspot.com/
Comments