29 May 2009

SCADA networks and end point security?

Cisco announced a new product line recently aimed at enhancing global power grids and taking the technology forward several generations. Cisco in the power grid, you ask? When did Cisco start manufacturing insulators, transformers, and power cable? They didn’t. Instead, Cisco has released it’s new line of Supervisory Control and Data Acquisition (or SCADA) devices intended to create the new “smart grid” championed by the White House. The intent is to use IP networks to automate the management and control of the power grid and to provide power providers and consumers with better tools to monitor and manage home and business energy use. Given the opportunity to bring new network technologies to the power grid, should utilities invest in the next generation of network technology? Or is this a high risk proposition?


For both consumer and commercial IP networks, security is a huge concern. The ability of unscrupulous individuals and groups to penetrate and exploit networks is the subject of intense scrutiny for most organizations and network providers. An entire field of security professionals work tirelessly across government and corporate networks to monitor, detect, and respond to network threats or security breaches. In spite of these risk controls, network intrusions continue to be a major risk to IP networks worldwide. As a recent example, the US Department of Defense announced they had spent USD$100MM to mitigate the impact of multiple attacks on its network infrastructure in just the last six months. In addition the Department of Defense announced it was establishing a new “cyber command” to defend against Internet attacks from adversaries and the White House announced the appointment of a cybersecurity czar to oversee US government network security.

In their security and risk blog last week, Forrester identified the need for end-point security when implementing IP enabled SCADA networks in the “smart grid”. This is certainly a good call given the risk of inadvertent or inappropriate use of the network by end users. The larger view of security, however, should go beyond just end-point security into multiple layers of risk control including application layer security, prevention of device corruption on the SCADA network (e.g., insertion of malicious code), security of all interconnections to the public Internet and partner networks, and intrusion detection systems.

The ubiquitous nature of power systems, the number of network enabled devices on the “smart grid” and the economic/strategic importance of the power networks dictate the importance of a comprehensive risk evaluation and implementation of a suite of robust risk controls. The networks of infrastructure providers, including SCADA networks, are major targets for espionage and hackers. Only last month US intelligence officials revealed that foreign intruders had penetrated the networks of US electricity providers in an effort to map critical infrastructure components. These intruders also reportedly left behind malicious payloads. Like the corporate networks of power providers, SCADA networks are also targets for exploitation. According to a recent article in Aviation Week, DARPA is weaponizing battlefield wireless network attack tools designed to penetrate and disrupt the operation of all types of networks, including SCADA networks.

The recent history of network penetrations and the deliberate design of disruption tools indicate a growing risk profile for the networks that control and manage the power grid. The upgrade of SCADA networks into the scalable IP based “smart grid” that Cisco suggests in their new product line will certainly represent a great enhancement enabling new tools and capabilities for both power providers and consumers alike. However, without implementation of multiple layers of security in these new enhanced networks, the risks could very easily outweigh the benefits.

21 May 2009

The Role of Internal Audit in Risk Management

With the global financial melt down fully in progress, the issue of corporate governance and the role of internal auditors is making headlines around the world. Empowered with new mandates to ensure regulatory compliance, auditors are no longer measuring against established standards and best practices. These control measures are considered a road map for corporate managers to hide their nefarious activities - measure acceptable against the standards, but hide unacceptable risks in other areas where internal audit won't look.

Now there is a general push in the corporate world for a more pronounced Internal Audit role in measuring and managing corporate risks based on the role that many audit groups have played in evaluating the effectiveness of internal controls to ensure Sarbanes-Oxley compliance. Today many corporate audit groups have also been engaged to evaluate and measure other corporate controls like safety, information security, and business continuity. On one hand, this may seem like a fantastic idea - engage the autonomous and sage warriors of corporate protection to identify unseen risks; but on the other hand, this audit experience might really be more like a disaster with inexperienced and poorly guided auditors running unchallenged through thoughtfully established and carefully maintained risk management programs. In the end this has me asking the question what role should Internal Audit play in corporate Risk Management?



Auditors can be both a blessing and a curse. When completed the right way, audits can be a huge help in identifying strengths and gaps in an established risk management program. The Internal Audit group can help identify areas where high loss potential risks exist and also help gain the focus of operational management to enhance and improve risk programs for the company. When the audit group carefully measures their observations against the real world severity of the risk they ensure that focus is placed on areas with the greatest loss potential. When audits are completed by auditors without knowledge and experience in the field they are auditing, the audit can make a mess of existing controls and can cause the business to incur needless expenses. Just think of a team of internal auditors who have no risk management experience or understanding of international operations. If this team of American auditors completes risk reviews of data centers in Europe, having never before traveled outside of the US, they will easily get lost in risks and controls that are not appropriate to the part of the world they are in, or the way data centers are managed outside of the US.

From my experience, the greatest risk reduction is achieved by building a trusted relationship between the auditors and the organization being audited. The corresponding free flow of information between all of the stakeholders in the audit process means that the Internal Audit group is able to develop a thorough understanding of the risks and controls in place - delivering a final audit report that garners respect from all parties involved in the process. The trusted relationship, however, has also been abused to great detriment. A recent example is the “sidelining” of AIG internal auditors who early on identified problems with the credit default swaps that got the financial giant in trouble. Instead of having autonomous power to report immediately to the board of directors, the finding was suppressed.

The use of “Enhanced Interrogation Techniques” by Internal Audit

At the other extreme of the trusted relationship, many Internal Audit groups are working hard to keep the organization being audited at arms length in an effort to minimize the ability of the auditors to be influenced. The tone of many audits is adversarial and the input of experts in the organization being audited is minimized. In an effort to show their value auditors, with limited knowledge and experience, identify undiscovered “risks” in their audit findings. These risks, however, are often minimal while more pressing issues are ignored. An example might be an audit finding that identifies a risk from having “live”data ports in an unoccupied cubical in an office. While it is true there could be a risk of unauthorized system access by cleaning crews or other third parties in the building the audit finding ignores several layers of mitigating controls (building access controls, VPN two factor authentication, domain authentication, application layer user authentication, etc.). With the input of experts kept at arms length the audit finding might ignore other more significant findings like network switches beyond end of life or switch software patches behind several rev levels. With independent authority and empowerment to report immediately to the board of directors, audited groups are unlikely to push back against auditors and their findings.

I can understand the intent of the arms length audit method, given the history of recent corporate governance issues, but I do not agree that corporate risk management is always best served using this approach. If the goal is to identify risks with the greatest loss potential, then the input of experts will always be necessary. If the goal is also to look for undiscovered risks then a good degree of autonomy will be required. Perhaps the best approach is a hybrid of both - build amicable relationships between the auditors and the groups being audited, but also look far and wide for risks. An audit should be an opportunity to identify what can be improved, rather than an interrogation session.

12 May 2009

Does Mitigation Work?

The snow started melting last month in North Dakota leading to the annual ritual of springtime flooding for residents along the Red River. A punishing blizzard and heavy rains helped saturate the ground and provided ample moisture to fill the Red River to a new record level. For days residents worried that Fargo’s levees would be topped or breached by this flood and emergency officials placed hope on sandbagging and planned evacuations of Fargo and Moorhead.

After the devastating floods of 1997, North Dakota and Minnesota engaged in a number of mitigation projects designed to reduce the risk of future floods. In Fargo and Morehead, the Main Avenue bridge was rebuilt and raised above the previous record flood level. Levees along the Red River were built higher - both in Fargo but also downstream in Grand Forks where the 1997 floods destroyed much of the town. Homeowners in frequently flooded areas had their homes bought out. In addition the National Weather Service received improved river monitoring equipment and connectivity.

In spite of all of this work completed more than ten years ago Fargo and Morehead still faced a catastrophic disaster earlier this spring as the snow melted. This begs the question - does mitigation work?

Without a doubt there are countless examples of how federal, state, and local monies spent on mitigation projects have reduced the impact of disasters on local communities. Drainage improvements, road and bridge enhancements, elevation of critical infrastructure above the base flood elevation have all been effective loss control tools against flooding. Each of these different types of mitigation projects reduce the risk of impacts caused by flooding. Completing several smaller improvements leads to a large risk reduction for the community through the cumulative value of all of the projects.

Levees as a mitigation project, on the other hand, are a double edged sword. Levees protect many communities from flood waters year after year but when they are topped, undermined, or eroded by flooding the results are catastrophic. Mitigation projects like levees are intended to reduce risk, but often people interpret levees as eliminating risk - which is clearly not true. FEMA is now recognizing this risk as they work to digitize their Flood Insurance Rate Maps or FIRMs. Areas once though of as protected from flooding due to levees are now considered by FEMA to be special flood hazard areas. The recognition of this risk, however, does not change the number of homes and developed properties in high risk areas.

Levees play an important role in risk reduction, but without evaluating property use and zoning property owners and communities have a false sense of security. As levees are built, communities should also evaluate what high risk occupancies exist in the immediate flood area. Officials should ask if critical infrastructure be moved? Could businesses and homes be encouraged to relocate out of the shadow of the levees? Fargo was saved this time by the work of an army of sandbag volunteers. Hundreds of people scrambled to raise the levees by three feet in advance of a predicted record flood. Levee breeches were few and the impact to Fargo was slight. Next time, however, Fargo may not be so lucky.

06 March 2009

Reflections on Galveston

Last summer, Hurricane Ike struck Galveston, Texas as a Category 2 hurricane, bringing serious winds and storm surge to the coastal island. The storm captured the nation’s attention, at least for a short while, as Galveston Island and metro Houston were evacuated en masse – the second such mass evacuation of Houston since Katrina devastated New Orleans. In the weeks and months since the storm passed, much progress has been made on rebuilding the island and it’s nearly ready for summer tourists. There are many gaps in the recovery, however, and lots of lessons to be learned from Galveston’s Hurricane Ike experience. I spoke this week on disaster recovery at a conference in Galveston and had an opportunity to take a tour of the island. Here are some of the things I saw and learned.


Lesson 1 – If the roof of your house is one of thousands damaged by a hurricane or tornado, you should probably think about doing your own roof repairs. Like New Orleans after Katrina, blue tarps were handed out as “temporary” protection from leaks and water damage. Months have passed since Ike came and went and many damaged roofs are still covered with blue tarps. Delayed insurance settlements, underinsured losses, and long waiting lists for roofing contractors have all led to the continued use of tarps. It’s unfortunate, but with hurricane season approaching again it could be a significant risk to those homeowners who have not repaired damages.

Lesson 2 – There is no such thing as “partial recovery”. In driving around the island businesses impacted by Hurricane Ike appear to either be fully operational or abandoned. Most national chain stores on the island like Home Depot and Kroger appear to be doing well and are likely recipients of assistance from their corporate headquarters. Many local restaurants, shops, and businesses are open and doing great. These businesses apparently had a plan to recover and/or resources to restart in the aftermath of Ike. The success of these business is obvious – they are clean, stocked with merchandise, have loyal customers, and don’t look one bit like a major catastrophe struck only a few months ago.

Lesson 3 – Businesses that fail after disaster do not shut down gracefully. Driving across Galveston, it’s very clear which businesses did not have a plan or the resources to recover after Ike. These are the collapsed and abandoned buildings that still dot the island. In town, you will see a string of 8 or 10 businesses that are open, then you will pass a collapsed building or an abandoned business that looks like it did the day after Ike passed. These businesses were not closed down gracefully, but were hastily left behind by their owners with no plan or action to clean up what was left behind.

Recovery work continues on Galveston Island even today. Eroded beaches are being restored with new sand, contractors are busily repairing damaged homes, and the tourism industry is gearing up for summer visitors. The island is open for business and there are plenty of fun things to do. I hope you’ll consider visiting Galveston at some point soon, it’s worth the visit. When you come, take a minute to drive around the island and ask yourself, “What if my business was impacted by a disaster? Are there plans and the resources to recover?”

02 March 2009

Disaster Recovery via the Mongolian Horde Technique

For businesses or communities developing a disaster recovery plan the development of a recovery strategy is often the make or break moment in the planning process. With a well thought out and documented recovery strategy most organizations, at time of disaster, will be able to recover critical functions. Without a solid recovery strategy, the “plan” won’t be worth the paper it’s written on. This concept is well proven and accepted by risk managers around the globe. In the challenging economic times we now face, however, the recovery strategy alone may not be enough – especially if your strategy is an implementation of the Mongolian Horde Technique.


The Mongolian Hordes refers historically to the ruthless armies of Genghis Kahn and his sons who conquered much of Eurasia in the 13th and 14th centuries. In battle, Kahn’s massive armies converged on their enemies with thousands of fierce soldiers mounted on horses and equipped with the advanced military technology of the day. Chinese engineers traveled with the Hordes making use of rockets and smoke to confuse and disorient their enemies. The Hordes moved rapidly overwhelming cities in siege and laying waste to conquered lands.

Today, the Mongolian Horde Technique refers to throwing a lot of human resources at a problem in an effort to fix the problem or rapidly bring a project to conclusion. Many organizations large and small have utilized this technique in everything from software development to solving construction delays. In the context of the recovery strategy, using Mongolian Horde Technique is one that has been practiced by many organizations in disaster – sending dozens upon dozens of technicians, managers, DBAs, engineers, or other responders into a disaster area with the mission of repair, restore, or rebuild. In many circumstances it can be an element of a successful recovery strategy.

The problem for most organizations today is that the “Hordes” no longer exist. Most companies and communities are running beyond lean – having trimmed their workforces to the point that services are discontinued and the workforce is overburdened. Organizations that once could summon a cast of hundreds for response to an incident, can hardly spare a few bodies to fix problems or respond to a disaster.

Today recovery strategies need to be more like a precision strike than like a siege by the Mongolian Hordes. Have businesses and communities anticipated this change? Have they adjusted their planning and their recovery strategies? Certainly in the IT world virtualization, load balancing, and data replication have made recovery of data processing functions far more precise where data center infrastructure enhancements have been made. In many cases, however, organizations have failed to enhance their infrastructure or have continued to articulate a strategy of massing an army of hundreds at time of disaster.

Businesses and communities who prepare to deliver precision strikes at time of disaster will be the organizations that survive and thrive. All others will be left standing in the burned out shell of their building waiting for the Hordes to arrive, and they never will…

23 February 2009

Risk Taking

An important priority for Risk Managers working with operational personnel, is providing education in safe work practices and pushing employees to avoid risk taking behaviors. Establishing a “culture of safety”, an environment where safe work practices are encouraged and rewarded, ensures that at the end of the day everyone goes home with the same number of fingers and toes they started the day with (a favorite saying of a safety director I used to work with). In my own day-to-day work in the business world, I’ve had the pleasure of doing business with Chevron who has one of the strongest safety cultures of any company that I have worked with. The start of every meeting at Chevron, regardless of the level of those participating (executive management or line employees), begins with the “safety minute”. Someone will stand up in the meeting an offer a safety tip or safety suggestion. As a vendor, I’ve been responsible at meetings with Chevron for providing my own input to the safety minute.



Take a minute to watch the embedded video above. In this video you will see the Abilene, Texas Fire Department attack a brush fire that occurred in February 2009. As the brush truck inches closer to the fire it slowly rolls over injuring the firefighters riding on the outside of the vehicle as well as the driver in the cab. Two of the three injured firefighters each had 26 years of experience. Now ask yourself the following question – do you think the Abilene, Texas Fire Department has a strong safety culture? Do you think employees are coached in avoiding risk taking behaviors?

In the aftermath of the accident an Abilene Fire Department Lieutenant was quoted in the Abilene Reporter-News dismissing the accident as, “a rarity,” and continued by stating, “our guys are really good at what they do.” It is this type of statement that undermines, rather than builds a safety culture – the lesson learned is chalked up as a fluke or an accident unlikely to be repeated. A better response might have been to say “we’ve made an error, we’re going to learn from it, and we’re going to teach and train our people to ensure it never happens again.”

Does your company/organization educate employees about safe work practices?
How would your company/organization handle such a serious safety issue?
Does your company/organization have a culture of safety?

Leave me a comment and let me know you thoughts, I would appreciate your feedback.

21 February 2009

A Lesson in Personal Resilience

Recently residents of Toronto, Ontario experienced a brutal 24-hour power outage while outdoor winter temperatures dipped way below zero. Many residents of Canada’s largest city suffered through freezing cold temperatures while power crews struggled with a broken water pipe that impacted power distribution for tens of thousands. More recently ice storms across Oklahoma, Arkansas and Kentucky left hundreds of thousands without power for days. During some of the most extreme weather of the year, people across the central US struggled with daily life in the absence of power. While many are busy expressing their frustration at the slow pace of power restoration, the better course of action might be to learn a lesson in personal resilience.


If you ask ten people how to define a “disaster”, you will likely get ten different answers. Some might identify large numbers of casualties, or others might speak of local resources being overwhelmed. Sociologists Frederick L Bates and Walter Gillis Peacock defined “disaster” as the inability of individuals and their social systems to adapt to a change in the environment around them. The recent power outages across Canada and the US demonstrated to many their inability to adapt to the power outage environment.

How would the circumstances of these power outages change if people were more resilient? Would these events still have been a disaster? Imagine for a minute if those impacted by these power outages were prepared with fire wood and a wood burning stove or a fireplace. Think about hundreds of Kerosene heaters and their cool blue flames keeping households across Canada and the Central US warm. What if those impacted by the ice storms had a personal plan on how to function without power? The challenges experienced by many would have likely been not so severe.

Power outages during ice storms and cold weather are a predictable phenomenon. So are wildfires, floods, hurricanes, tornadoes, and earthquakes. Yet, in spite of the predictability of these hazards, most people do not have a plan on how to adapt or do not even think about having basic resources around for surviving even a short term emergency. Shouldn’t we all learn a little lesson here and think about our own situation and how we could adapt rather than falling into a “disaster”?