All organizations face risks simply by virtue of operating in an uncertain world. Furthermore, most observers agree that “for a business to survive, growth is an imperative, not an option.”1 Yet growth, especially, brings added risk as a result of the increased uncertainties that come with new products, customers, geographies, or strategies. The three areas of investment in risk management and resilience—detection, prevention, and response—help reduce the duration, likelihood, and magnitude of disruptions. Yet these three elements do more than directly address risks—they also indirectly add value to the organization in other ways.
In discussing the benefits of enterprise risk management (ERM), Steven Dryer, managing director at Standard & Poor’s (S&P) noted, “In many cases, senior executives introduced ERM as a compliance exercise and hence are more likely to focus on ERM’s loss-avoidance features and less likely to see ERM as an opportunity in managing uncertainties of both negative and positive directions.”2 In contrast to the compliance-oriented view of ERM, more advanced companies realize that managing risks can also help increase performance on many operational, competitive, and financial dimensions.3 Investments in resilience and risk management may seem like conservative, risk-avoidance initiatives, but these processes can enable a company to be less risk averse and bolder in pursuing growth, despite the added risk and uncertainty.
Resilient companies invest in specific response strategies, such as Cisco’s playbooks mentioned in chapter 6, to cope with relatively high-likelihood, identifiable risks. Yet, as also described in that chapter, companies also have to prepare for unforeseen or unknown types of disruptions, by creating a set of general processes to deal with any business interruption. Companies prepare to respond by creating assets such as emergency operations centers (EOC), business continuity plans (BCP), or drilling staff in disaster scenarios.
“No plan survives first contact with the enemy,” the 19th-century German field marshal Helmuth Von Moltke is credited with saying.4 Thus, military organizations have to invest in readiness and prepare to respond to the unexpected.
When hurricane Katrina veered toward New Orleans in August 2005, the United States Coast Guard (USCG) was ready to respond. In fact, the USCG leapt into action days before the hurricane struck. When the commanding officer for Coast Guard Sector New Orleans, Captain Frank Paskewich, saw that Katrina “was making a beeline for New Orleans … from that point on it was ready, set, go.”5
On August 26, three days before the storm made landfall in Louisiana, the Coast Guard began implementing its Continuity of Operations Plan, in which Sector New Orleans relocated to Alexandria, Louisiana, and Coast Guard District 8 Command shifted to St. Louis. “We wanted aircraft on both sides of the hurricane-hit area,” said Captain Artie Walsh, head of the district’s search and rescue office. “We had units scattered all over … because we were afraid if we had one safe haven or two … we’d lose our resources,” said Captain Robert Mueller, the deputy Sector New Orleans commander.6
At the same time as aircraft, vessels, and personnel near New Orleans and the Gulf Coast were leaving the most dangerous spots, the Coast Guard was pulling pilots, swimmers, flight mechanics, maintenance workers, and support personnel from all over the country toward the area, including 40 percent of its nationwide helicopter fleet.7 Rear Admiral Duncan planned to reenact the movie Apocalypse Now with Coast Guard helicopters. “I wanted to darken the sky with orange helicopters. … If [people] feel that they need help, I want them to see an orange helicopter somewhere overhead that they can wave at and we’ll come get them, and frankly we did that.”8 During four days in Katrina’s aftermath, the US Coast Guard rescued 33,500 people (compared to an annual average of “only” 3,500 rescues for the entire United States),9 and delivered tons of supplies to the devastated communities.
While the USCG responded quickly and effectively, the Federal Emergency Management Agency (FEMA) was widely criticized for being late to respond, unprepared, and ineffective.10 This was despite FEMA holding a five-day disaster simulation of a hurricane hitting New Orleans only a year earlier.11 In an after-action analysis, the Government Accountability Office (GAO) said, “Precisely identifying why the Coast Guard was able to respond as it did may be difficult, but underpinning these efforts were factors such as the agency’s operational principles. These principles promote leadership, accountability, and enable personnel to take responsibility and action, based on relevant authorities and guidance.”12
The Coast Guard, unlike FEMA, the National Guard, and many other Federal agencies, is a frontline organization with the authority to act even before the higher-ups know that action is needed. Yet the most important aspect of that difference is that a culture of empowerment—granting authority to people to do what is needed—extends down to all levels of the Coast Guard. For example, when a junior-level pilot flying a C-130 on an environmental inspection mission arrived in New Orleans, she noticed a problem: search and rescue helicopters could not communicate with local officials on the ground. Rather than continue with her official mission or ask what to do, she immediately created an airborne communications platform for the area to help coordinate helicopter flights and get people to safe landing areas and hospitals.13
Other organizations, such as Zara, the Spanish fast fashion retailer, also have a speed of response that is the mark of any flexible organization. Zara empowers its designers to make decisions without going up and down the corporate hierarchy. For instance, when one of Zara’s 300 designers noticed a blouse Madonna wore at the beginning of her 2005 concert tour, he realized that his customers would love the singer’s look. Unlike other retailers that require extensive preparations, market research, and the permission of senior managers before approving a new look, Zara’s designers can freely tap inventories, redesign garments, authorize manufacturing (by trusted local seamstresses who can quickly sew the pattern), and then ship the new clothing to stores. In this particular case, Zara designed a Madonna-inspired blouse and got it into stores in only three weeks, before Madonna finished her tour.14
After investigating the USCG response to Katrina, the GAO also concluded that “another key factor was the agency’s reliance on standardized operations and maintenance practices that provided greater flexibility for using personnel and assets from any operational unit for the response.”15 As Captain Bruce Jones, commanding officer of Air Station New Orleans, said, “The fact that you can take a rescue swimmer from Savannah and stick him on a helicopter from Houston with a pilot from Detroit and a flight mechanic from San Francisco, and these guys have never met before and they can go out and fly for six hours and rescue 80 people and come back without a scratch on the helicopter—there is no other agency that can do that.”16 Paradoxically, structure and standards can create flexibility and agility, not rigidity and sluggishness, when well-trained teams have the authority to adapt their training to new situations. “That’s the nice thing about the Coast Guard; we don’t really need to talk a lot … a couple words pass between a couple of sailors and the job gets done,” said CWO3 Robert David Lewald, commander of a Coast Guard construction tender.17
The use of standards to create flexibility for response is not unique to the USCG. For example, Southwest Airlines uses only Boeing 737 aircraft (see chapter 13). This means that any mechanic can service any plane and any pilot can fly any airplane in the fleet, allowing for quick recovery from weather, congestion, and other disruptions that bedevil an airline. Similarly, the standard procedures used by UPS in their unloading, sorting, and loading operations allowed UPS to recover quickly from an ice storm that shut down Louisville, Kentucky, in 1996. The storm closed all roads, and Louisville workers were not able to come to work at the airport, where Worldport—the US hub of UPS air operations—is located. But UPS was able to fly workers from other parts of its vast empire to Louisville and, because the operations are standardized, these outside workers could operate the hub.18 Coupled with empowerment, standards are the key to flexible operations—they allow for risk pooling of assets and surge capacity while empowering frontline responders to improvise when the conditions change.
In tandem with standard procedures, the USCG culture permits flexibility by personnel, which was evident in the improvisation of rescue techniques described by Lieutenant Iain McConnell: “At first we used basket hoists for most survivors, but then the swimmers found that the quick strop hoist technique was quicker so that was an improvisation, and the whole swinging-like–a-pendulum-to-get-a-swimmer-up-onto-a-balcony-underneath-a-roof, that’s definitely something you don’t practice,” he said,19 and then added, “Yes, a lot of improvisation. But in general that’s what Coast Guard aircrews do best.” The flexibility of the Coast Guard stems from trusting people to do the right things, giving them the authority to take action, and not putting too many bureaucratic hurdles in their way.
Organizations such as Walmart and UPS invest in response procedures and assets because they serve customers everywhere, including locations with high likelihoods of disruptions from hurricanes, snowstorms, and other natural disasters. Many multinational companies have facilities and suppliers around the globe in vulnerable areas. While the probability that a particular disruption will strike a particular location at a particular time is very small, the likelihood that some crisis will happen some place at some time is significant. Consequently, such companies can justify investments in EOCs and BCP.
If and when disruptions occur, preparations for response pay off in terms of both accelerated recovery and mitigated impacts. In other words, the option to use risk assets is exercised (see chapter 6).To the extent that emergency response and business recovery teams are active the minute a disruption hits, recovery can begin immediately, shortening the duration of the disruption. Preorganized teams, precreated plans, a preconfigured “war-room,” and prestocked recovery supplies all help accelerate response.
Each disruption also offers a learning opportunity. As Mark Cooper, senior director of Walmart’s emergency management, commented, lessons from Katrina and other storms helped Walmart improve the efficiency of its response by a factor of three or four. A robust EOC and drilled recovery process help the company reopen its stores faster and at lower cost than before.
The Coast Guard is geared for response because it can’t prevent the incidents to which it must respond. Most companies, however, can take steps to reduce the likelihood of disruptions.
Inattentive or sloppy processes have dire consequences for companies in high-risk industries. In December 1984, a leak of methyl-isocyanate from Union Carbide’s plant in Bhopal, India, killed 4,000 and injured over 500,000. Some sources put the death toll much higher.20 Union Carbide never recovered from this horrific tragedy.21 Less than a year later it was the target of a hostile takeover by GAF Corporation, which forced it to divest many of its most profitable divisions.
As mentioned in chapter 11, BP lost $53 billion in market capitalization in the wake of the Horizon drill rig explosion in April 2010. Four years later, in August 2014, the stock was still more than 30 percent lower than its value at the beginning of 2010.22 By 2014, the company had paid $27 billion in cleanup costs, fines, and settlements, with some cleanup liabilities and many court cases still pending.
The failure to imagine and model the consequences of coseismic coupling in the faults around the Japanese islands off the coast of Fukushima (see chapter 1) resulted in significant damage to the nuclear reactors from the 2011 tsunami. Water flooded the plants and the reactor control systems. With no backup ability to dissipate heat, the cores overheated, causing explosions and nuclear meltdowns. The incident raised energy costs in Japan as the government started turning away from reliance on nuclear power. It also raised energy costs in Germany, where the government closed all of its old nuclear power reactors and decided to phase out the remaining ones by 2022. As a result of the increased energy costs, numerous energy-intensive German manufacturers have diverted many of their capital investments out of Germany.23
BASF buys, handles, manufactures, and ships a wide array of chemicals, many of which are highly flammable, highly toxic, or both. To ensure the safety of its workers and the citizens living near its plants, BASF’s risk prevention culture spans the entire organization from the board of directors to frontline employees.24 Relying on standard procedures, the company uses a risk management process manual and a set of standardized evaluation and reporting tools based on the 2004 COSO II (the Committee on Sponsoring Organizations, Treadway Commission) framework.25 While the board of directors approves investments in risk management, the company delegates the management of specific risks to local business units. BASF develops models for all manner of potential industrial accidents, down to the failures of individual pumps, valves, and tanks.26 BASF’s culture of prevention means it biases its assessments toward worst-case risks. If a type of event (e.g., an explosion in a mixing tank) can produce various levels of severity at different probabilities, BASF uses the worst-case risk class to decide the level of prevention efforts.
Mark Twain once wrote, “Man is a creature made at the end of the week … when God was tired.”27 Indeed, the vast majority of safety incidents are due to human error.28 To reduce the rate of human error, BASF trained more than 10,000 employees in process safety and more than 47,000 employees in compliance in 2013.29 The training also addressed prevention of cybercrime and the protection of knowledge and sensitive information.
To create a global safety culture, BASF emphasizes visible leadership and open dialogue, as well as many prevention-related KPIs such as the lost-time injury rate, number of accidents, and product spillage.30 As of 2012, the company had about half the lost-time injury rate of other safety-oriented chemical companies31 who are members of the Responsible Care Global Charter.32
BASF’s culture of prevention extends beyond safety and compliance risks. “We try to prevent unscheduled plant shutdowns by adhering to high technical standards and continuously improving our plants,” BASF wrote in its 2013 annual report.33 The prevention of downtime extends to procurement decisions. The company assesses critical paths in the flow of materials and adds capacity to ensure it has supply alternatives, according to BASF’s Dirk Hopmann.
Other companies face less tangible but no less dangerous potential disruptions that drive them toward prevention. As mentioned in chapter 11, Disney’s image means everything to the company. Approximately $29 billion of the company’s value is ascribed to its brand.34 With Disney’s emphasis on children and families, the company is especially concerned with preventing social responsibility risks. Consequently, it is selective about the countries from which it sources and with whom it does business. For example, Disney won’t buy from eight countries including Sudan, Iran, and Burma because of the difficulty in ensuring acceptable working conditions.35
Reducing the likelihood of disruption also reduces the likelihood of payout by the company’s insurers. When Microsoft builds new data centers or other key facilities, it uses HPR (highly protected risk) standards by working with engineers from its insurance company, FM Global.36 Achieving HPR certification requires design and operating features that reduce the risks of fires, floods, and seismic damage through prudent site selection, material selection, protective features, equipment redundancies, and proper attention by personnel.
HPR sites have one-quarter the probability of loss and one-tenth the average gross loss compared to non-HPR sites,37 which translates into lower insurance premiums. Microsoft uses estimated economic value to quantify the value of investments in risk management. “So far in fiscal year 2009, which goes through June 30, FM Global has calculated more than US$1.8 billion in risk improvements,” said Susan Shaw, senior risk manager at Microsoft.38 The model-based calculations included factors such as potential losses, deductibles, and premium savings over the life span of the building to offset the added construction costs.
Although HPR is primarily about minimizing property losses, it also reduces the likelihood of business interruption. For example, to achieve HPR at a new data center, Microsoft divided the space in half with a fireproof wall so that the maximum foreseeable loss is only half the facility and the surviving half can maintain uptime for customers. “With data centers, the scale of the loss involves more than just property. It’s losing credibility with the public, and losing standing in the technology community,” said Shaw. “One thing that’s not insurable, and that I’ve taken into account, is reputation risk. There’s a tie-in between managing reputation and managing any kind of risk—in this case property, including business interruption,” Shaw concluded.
Prevention and response are complementary aspects of risk management, and different companies may emphasize different investments in one over the other. Prevention efforts reduce the likelihood of events that would need a response. In contrast, response capabilities allow companies to accept certain risks by relying on their mitigation prowess. Neither approach suffices on its own because of the uncertainty involved. Prevention cannot avoid all disruptions, and response capabilities can’t mitigate all impacts to an acceptable level. The balance between the various investments is company- and facility-specific, and it depends on the tradeoff between the cost-of-prevention and the cost-of-response.
Finally, as companies and industries become safer, the marginal cost of the next safety measure increases and the marginal value of preventing the remaining extremely rare events drops. Such an effect may be starting to happen in the airline industry, in which some question whether postcrash airline location monitoring measures proposed in the wake of the disappearance of Malaysia Air 370 are worth the costs, given the rarity of such events, and the possible use of the investment to enhance air safety measures that may have higher expected benefits.39
Vigilance entails investments in monitoring current events, surveying suppliers, visiting supplier facilities, “score-carding” inbound shipments for quality, analyzing natural hazards models, and other data-gathering and analysis activities. Companies can use the resulting knowledge of risks and events to prevent and respond to them.
Rather than wait for executives to hear the news during the morning drive to work, Cisco’s incident monitoring process runs 24 × 7, with personnel around the world in different time zones. Cisco combines monitoring with an escalation process that guarantees a two-hour response time. During the 2011 Japan earthquake that occurred at 9:46 pm Cisco headquarters’ local time, the company detected and understood the significance of the event within 40 minutes and had escalated it to senior management 17 minutes later.40
Even before a natural disaster strikes, companies such as P&G and Walmart use the detection lead time to marshal resources for the postdisaster recovery. If a company detects an imminent disruption it can, for example, relocate assets and inventory out of the disaster zone, perform controlled equipment shutdowns to avoid machinery damage or hazmat release, address social activist issues before they go viral, or start backup systems. For example, as mentioned in chapter 8, OKI Semiconductor Company avoided about $15 million in losses by using systems that detect earthquakes and provide a few seconds or minutes of warning.41 Timely detection also helps limit the impact of hidden disruptions such as contamination, counterfeiting, and cybercrimes in direct proportion to the reduction in the duration of the damage.
Early detection can also provide competitive advantage in constrained-supply scenarios. In 2013, Juniper Networks got an urgent notification from Resilinc’s monitoring service. A key supplier of memory chips had had a fire that would affect 21 parts. Within a couple of hours, Juniper had analyzed the issue and, according to Juniper’s Joe Carson and Dmitri Kamensky, secured alternative supplies at prices that were substantially less than what slower-acting companies paid. A study of nearly 4,000 European firms found that companies in more competitive situations were significantly more likely to be vigilant.42 Some companies are always looking for trouble.
Early detection can also be an important factor in preventing disruptions. Detection of a potential problem as a result of, say, deteriorating labor relations at a supplier, parliamentary debate regarding drastic government regulations, or increasingly negative buzz about a supplier on social networks allows the company time not only to prepare mitigation activities, but also to counteract and possibly avoid a disruption. Alternate suppliers can be contacted, resources can be used to lobby governments, or the causes of negative buzz can be redressed.
Detection for prevention extends into the supply chain. In tandem with the proliferation of NGOs and media watchdogs, many companies are working hard to ensure that not only they but also their suppliers do not cause CSR, safety, or quality-related disruptions (see chapter 7). For example, the Ford Code of Conduct43 mandates ethical conduct and high social responsibility standards at suppliers in every country where it does business. Furthermore, the code requires that “all Company personnel must report known or suspected violations of this Policy through the established reporting channels. The Company prohibits retaliation against anyone who in good faith reports a violation.”44
Audits provide value in three areas. First, audits help detect specific problems in specific suppliers. Disney’s audit checklist, for example, includes 75 questions covering issues such as working conditions, underage labor, fire safety, worker freedoms, and healthcare. Second, audits provide insight into country-level trends, which may help detect risks and also highlight business opportunities hidden in those trends. Third, audits have a direct preventative value in addition to detecting risks, because suppliers are likely to take proactive steps to avoid unfavorable audit reports.
At some companies, this kind of detection extends to the deeper tiers. For example, Intel tries to get potential suppliers to reveal their own suppliers early in the relationship, in order to detect potential risks and assess vulnerabilities. Intel does not always succeed, because of suppliers’ reluctance to share competitive secrets. Yet the company has been a leader in the tracing of conflict minerals, which required understanding of its supply chain to the deepest tier (sometimes down to Tier 6 or 7).
Detecting minor events that did not cause a disruption but could have is one way to detect, prepare for, and prevent larger disruptions. The aviation industry has long recognized the wisdom of learning from mistakes and minor incidents, even when they do not cause an accident. The Aviation Safety Reporting System (ASRS) collects and analyzes voluntarily submitted, confidential aviation incident reports to identify systemic or latent errors and hazards and to alert the industry about them. The ASRS receives more than 30,000 reports annually and issues directives on a regular and as-needed basis. Most aviation experts agree that these efforts have resulted in an ever-increasing level of civilian airline safety as system operators increase their vigilance by recognizing more conditions that can lead to disasters.
Hospitals use a similar “near miss” analysis system of reporting, investigating, and identifying vulnerabilities to root out medical mistakes. Likewise, BASF insists on timely recording of any safety-related incidents, including near-misses, that could indicate a vulnerability.45 Industrial accidents follow a power law distribution, as described in chapter 2, which implies that organizations can use data on the frequency and impacts of small events to predict the likelihood and impacts of much larger events. This analysis of small events helps detect and prioritize risks that could potentially produce unacceptable disruptions.
No company takes on additional risks for the excitement of the exposure, peril, or potential liability. Yet, as mentioned above, risk is part of running a business, and it goes hand-in-hand with growth. To prosper, companies must grow; and to grow, companies have to manage the risks and uncertainties inherent in taking on new initiatives where less is known.
Many companies see risk management as just another cost with no sure benefit. In the words of a transport manager, “It takes resources away from what our core business is.”46 An unused EOC seems like squandered office space and corporate resources. Drills take time away from day-to-day operations. Extra inventory is expensive. The perception of waste can seem doubly true with prevention strategies, because they intentionally seek to ensure that nothing ever happens. Yet investments in resilience can provide value, directly and indirectly, as well as support growth.
Traditionally, organizations estimated the value of investments in resilience in terms of the avoidance or reduction of losses created by disruptions. Prevention reduces the likelihood of disruptive losses, response reduces the consequences of disruption, and detection improves the effectiveness and timeliness of prevention and response. Each “it-could-have-been-worse” event is tallied as a win for these kinds of investments.
For example, Cisco created a database of risk mitigation efforts and subsequent disruptive events. In addition to helping the company track its risk mitigation efforts, the databases let Cisco tally the direct value of those efforts.47 By documenting the improvement in recovery time resulting from its risk management processes, Cisco tracks impacts that it avoided, such as lost revenues, late shipments, and other critical business metrics.
In many ways, this approach looks at spending on resilience in the same way that the company looks at spending on insurance: companies buy it because they feel they have to, even though a direct return on insurance premiums can be measured only when a disaster strikes.48 Under this view, therefore, the ROI of resilience is only measured in terms of how much it reduces the likelihoods and consequences of disruptions. This view, however, misses many other advantageous aspects of investments in resilience.
Resilience is superior to insurance for four primary reasons. First, insurance offers only financial indemnification, whereas resilience also helps avoid the loss of trust or reputation incurred if a company fails to fulfill its commitments to its customers. Second, insurance often covers only named hazards, but resilience can also cover unknown, uncertain, and acts-of-God events. After the 2010 Iceland volcano eruption, insurance companies denied business interruption claims—even for airlines and airports—because the volcano caused no physical damage that would create the basis for a claim.49 Third, insurance is an adversarial transfer of risk and faces uncertainties in pay-outs, whereas resilience is an internal capability aligned with the business. In its annual report, Intel notes that one of its risk factors is that “one or more of our insurance providers may be unable or unwilling to pay a claim.”50 The exact wording of a policy and the legal interpretations of that wording affect whether a particular incident creates a valid claim.51 Finally, the biggest difference is that resilience can bring competitive advantages even if no disruption ever occurs, because resilience can improve both top-line and bottom-line performance.
At 6:30 pm on December 6, 2004, a fire broke out on the 29th floor of the 45-story La Salle Bank headquarters in Chicago. Smoke inhalation and other injuries afflicted 37 people but, fortunately, no one was seriously hurt as 500 people evacuated the smoke-filled building. Some 450 firefighters worked to subdue the fire that raged for five hours and caused $50 million in damage to the historic Art Deco building.52
Even as firefighters were arriving and bank workers were evacuating, the company activated its crisis management plans at 6:45 pm via a pre-established emergency conference call. At 8 pm, the crisis management team held the first meeting and all department-level business continuity plans were officially initiated. A disciplined approach to crisis team meetings kept them short and on track as the teams determined how to handle the damage to headquarters.
LaSalle Bank’s mantra became “business as usual” and at 7:30 am the next morning, the bank opened. Some 750 workers went to prearranged backup sites and 400 others telecommuted from home. Constant customer communications, including a telephone system that automatically forwarded customer calls to relocated workers, assuaged any anxiety about the bank’s ability to keep going. Throughout the crisis, the bank worked to avoid conflicting stories coming from different people at the bank and from city officials. Journalist reports in the local, national, and banking industry press highlighted La Salle’s resiliency. By continually making clients aware of what was happening, the bank succeeded not only in retaining its current customers, but actually signing several major commercial customers after the fire. These large commercial customers cited LaSalle’s resilience and continued customer service throughout the disruption as the reason for the business.53
When hurricane Sandy threatened the East Coast in 2012, Walmart already knew what to do. In 2004, Linda Dillman, chief information officer, claimed that “we have gathered so much historical data, we’ve decided to anticipate what will happen in a given situation instead of waiting for it to happen, and then reacting.”54 From previous hurricanes, Walmart knew that people stock up on bottled water, tarpaulins, spotlights, and manual can openers. They also buy seven times the usual volume of strawberry Pop-Tarts. Headquarters alerted Walmart stores in the path of Sandy to pre-order these popular items in advance of the storm, as well as to order other in-demand products like Armour Vienna sausage, Spam, and hardy fruit, such as apples. The company also prepared for replenishment of after-storm cleanup supplies, such as mops and chainsaws. And just as important as deciding what inventories to push forward to boost prestorm and poststorm sales were the inventories that Walmart pulled back. Walmart knows people stop buying meat and other highly-perishable goods for fear they will spoil without power for refrigerators.
The data management systems that Walmart used to prepare for disasters are the same ones that help it prepare for seasonal changes, major holidays, and other fluctuations in demand. The same tools that help the company track the effects of summer weather on soft drink demand can track the effects of bad weather on bottled water demand, too. Walmart’s everyday inventory management system tells it which store has what goods, which store sold what, and which distribution centers are carrying what. Walmart’s trucks are equipped with onboard computer and communication systems that let shipments be redirected at any time. Resilience during a disruption and responsiveness in daily operations are two sides of the same coin.
Some companies use their response to crises to make improvements to everyday operations. The 2008 financial crisis hit many companies, including P&G’s feminine care division. “Obviously we weren’t happy about the drop in business,” said Stefan Brünner, the division’s Budapest plant manager for manufacturing in Europe, the Middle East, and Africa.55 Rather than just cut costs, the company launched an aggressive recovery program to transform the supply chain. “We saw this as an opportunity to focus on improving some supply chain fundamentals and emerge from the recession in a stronger position,” said Brünner.56
P&G tightened internal and external integration, including greater collaboration with key suppliers. It developed rapid product changeover capabilities to launch new products in order to reignite growth. As a result, the company increased manufacturing productivity by 20 percent, reduced regional inventory by 18 percent while keeping customer service levels high, cut material lead times by as much as 50 percent, accelerated new product launches, and dropped total delivered costs by more than 12 percent.57
When Intel uncovered a defect in its new Cougar Point chipset in 2011 (see chapter 8), it had to make six million replacement chips as fast as it could. Intel used an internal discussion forum called “Output Max” to expedite production and distribution. The response taught Intel how to go faster when needed. It reset expectations for what could be done when speed is the top priority. Intel now calls it “Cougar Point speed.” The event was a key part of Intel’s ongoing evolution to ever-greater speed and agility. “As fast as we did Cougar Point, you always find a way to do one more thing like take four hours out of manufacturing to make an earlier flight,” said Frank Jones, Intel’s VP and general manager, customer fulfillment, planning, and logistics.58
At Caterpillar, visibility tools help the company become more responsive when managing disruptions or day-to-day operations. “I can see everything in motion,” said an expert from Caterpillar.59 “Now I can respond effectively to disruptions, see how the network is flowing, see delays and the costs they incur. I can manage a single disruption like a port labor strike.” Those same tools can also tune the company’s network. “I can optimize because I can see everything. That allows me to drive better predictability. And I couple that with analytics to figure out what dials and levers to adjust to make improvements. That gives me a much better supply chain,” said the same expert.60 The company’s efforts are paying off, and Caterpillar rose two places in the 2013 Gartner Supply Chain Top 25. Resilience during a disruption and agility in normal operations are both benefits of the same investments in better visibility and management of uncertainty.
At an MIT supply chain conference during the financial crisis, several companies cited a rise in positive communications and the benefits of collaboration created by response efforts.61 The financial crisis brought dramatic changes in consumer behavior and threatened the survival of key players in companies’ supply chains. Not only did internal departments work together more (sales, supply chain, and finance) but externally, third-party logistics providers (3PLs) and even competitors worked together. In a survey of 650 executives regarding BCM’s benefits beyond incident management, 56 percent (the highest number in the survey) reported improved cross-functional understanding and working within and outside the organization as a collateral benefit of BCM.62
After 9/11 revealed the vulnerability of Manhattan skyscrapers, the risk management group at a leading Wall Street financial services company concluded that the entire staff needed the ability to work from home. But upper management balked at the cost of supporting tens of thousands of telecommuters. The risk management group then partnered with HR and repurposed the project as a diversity and inclusion initiative that would allow mothers to stay with their babies and empower disabled employees to stay active. Identifying an HR benefit aided the effort to gain approval of the resilience project.
“Risk can also be the inability to capitalize on an opportunity,” according to Boston Scientific.63 Companies can face surges in demand during economic recoveries, competitor disruptions, and new product launches. For example, when Starbucks launched its breakfast sandwiches, it estimated a low, medium, and high forecast for the expected demand. The actual demand was 200 percent higher than the highest forecast, requiring Herculean efforts to satisfy customers.
Intel’s “Output Max” discussion forum helps the company maximize the output of its manufacturing lines to deal with disruptions as well as with unexpected demand for new products. If resilience is the ability to bounce back from downside events, it also provides the ability to bounce forward for upside events.
Detection speed plays a key role in the competitive advantage of Takeda, a mid-size pharmaceutical firm. Like many companies, Takeda integrates internal information with external data obtained from a third-party service provider, but just gathering the data isn’t sufficient. “Speed of response is quite important so we do not miss any business opportunities,” said Hiro Fukutomi, managing director, Takeda UK.64 “The competitive advantage derives from the measures you take out and how quickly you react to the information out there,” said Axel Mau, chief financial officer for Takeda’s German subsidiary.65 For example, if the company hears of a competitor’s planned product launch in a particular region, Takeda crunches the numbers in real time and “right away, the sales force can put efforts in that region to hold the market share or increase it. If you have to wait a month before we have an analysis—as we’d sometimes have had to in the past—then it’s rather difficult,” Mau explained.66
After a five-year implementation of rigorous, holistic risk management practices, Canadian utility Hydro One received a favorable credit rating by both Moody’s and Standard & Poor’s. Credit analysts specifically cited the company’s efforts at risk management in granting the rating, which gave Hydro One a lower cost of capital on a $1 billion loan.67 During the 2008 financial crisis, creditors and credit rating agencies came to realize that the likelihood of a borrower repaying a loan depends intimately on the borrower’s likelihood of surviving disruptions. Rating agencies such as S&P began to explicitly analyze a company’s enterprise risk management to assess its risks and preparedness.
The rating agency’s analysis of creditors’ ERM takes into account an organization’s risk management culture and governance, risk controls, emerging-risk preparation, and strategic management.68 Although S&P does not attempt to estimate all the likelihoods and impacts of supply chain disruptions, it does consider four broad factors: country risks, industry risks, operating risks, and governance. The analysis takes into account factors such as the dependence of the organization on a small number of key facilities, the financial resources to absorb calamities, and the company’s sensitivity to industry-specific disruptions—such as airlines’ vulnerability to terrorism or agribusiness’ vulnerability to commodity prices.69 The analysis results in a one-to-four rating of weak, adequate, strong, or excellent, which modulates the company’s credit rating and cost of capital.70
Two cross-sectional studies find a correlation between risk management practices and financial performance. A large-scale survey and analysis in 2012 of more than 500 firms affiliated with the Federation of European Risk Management Associations (FERMA) contrasted the five-year financial performance (2004–2011) of those companies with “advanced” risk management practices versus those with lower levels of maturity in risk management. Advanced firms had nearly double the five-year revenue growth rates (16.8 vs. 8.9 percent) and more than double the five-year EBITDA growth rates (20.3 vs. 8.9 percent).71 A 2005 Conference Board analysis of companies with advanced ERM likewise found they had statistically significant higher profitability and lower earnings volatility.72
Resilience has something in common with quality. Quality investments cost money, which seems to imply a choice between low cost with low quality and high cost with high quality. But a key insight from the quality movement pioneered by the Toyota Production System is that letting defects corrupt a product is even more costly than ensuring the quality of the raw materials and processes. Avoiding a defective part is cheaper than fixing a defective car.
Similarly, resilience—developing prevention measures, response alternatives, and detection systems—costs money, which seems to imply a choice between fragile efficiency and expensive robustness. Yet fragility may be more expensive than resilience. Done right, resilience investments can have a positive return.
The proverbial “an ounce of prevention [or preparation] is a worth a pound of cure” remains in force. Yet, as mentioned above, this view is too narrow in tallying the returns of resilience investments. Resilience is more aligned with the cost and growth goals of a company than the intuitive cost vs. resilience would suggest. For example, according to 36 percent of executives in an international survey, BCM also provided process optimization through analysis and greater understanding of end-to-end dependencies and key activities.73
As with quality, investment in resilience can pay off through faster recovery times, lower impacts, and the many indirect advantages discussed above. Yet, the best level of investment is unclear. A company can overinvest in quality—creating a car that needs no maintenance, lasts for a long time, but is very expensive. Similarly, building heavily fortified factories, monitoring every supplier action, and using only suppliers with perfect financials may be possible but would be expensive. Furthermore, it may stunt growth by deterring procurement from innovative but higher-risk suppliers.
The proper level of investment in resilience varies from company to company and industry to industry. Proper investment levels are relative to the risks, which depend not only on geography, industry, position along the supply chain, and strength, but also on customer support and the company’s general reputation. For example, when Nike was accused in the 1990s of running sweatshops and employing child labor in Pakistan, its sales and market value plunged. In contrast, when worker suicides at a supplier highlighted poor working conditions in Apple’s supply chain, the company weathered the storm and suffered no loss of sales. Apple most likely owes this to the loyalty of its customers and the aura of its brand, something that not many other companies can count on. For many companies, the best level of investment in resilience might be gauged relative to the competition. In a race to be the least disrupted and the fastest to recover, it may pay to spend just a little bit more on resilience than peer companies spend, regardless of whether the industry tends to spend high or low.
One common, yet erroneous, assumption is that managing in a risky world requires being a stodgy, conservative organization. If anything, the opposite is true—stodgy, conservative organizations may be deficient at managing risks. In a study of decision making during extreme events, one financial services organization manager admitted, “[We are] an older organization. Decision-making is slow and restricted to a few senior managers. We need considerably quicker responses to crises and to do things at multiple levels, but it doesn’t happen. For example, after the 7th July terrorist attack in London, we had no capacity to make quick decisions and we couldn’t get statements to the media as quickly as we would have liked.”74
S&P’s Dryer writes that a “successful risk culture begins with fostering open dialogue where every employee in the organization has some level of ownership of the organization’s risks, can readily identify the broader impacts of local decisions, and is rewarded for identifying outsize risks to senior levels. In such cultures, strategic decision-making routinely includes a review of relevant risks and alternative strategies rather than a simple return-on-investment analysis.”75
Toshiba’s subsidiary Westinghouse Electric is extremely conservative, as one would expect of a company that makes and services nuclear reactors. “We don’t train people to take risks, we train them not to,” said Stephen R. Tritch, chief executive officer of Westinghouse Electric.76 Yet the parent company also wanted growth, which required Westinghouse to branch out, be creative, and try new things, which meant taking risks. The company took a “fast, small bets” approach, in which engineers and managers could learn more about customer needs even if they didn’t win every contract. The company started taking business risks such as hiring twenty engineers from a competitor and opening an office for a new line of business before they had any contracts to justify the expense.77
As a result, Westinghouse won new business, including business in a market the company had previously left to competitors, and in a new area: developing new methods to repair an alloy used in old reactors.78 By 2013, Westinghouse’s revenues reached a record $5 billion,79 up from $2 billion in 2004,80 in spite of both the global recession and the aftermath of the 2011 Fukushima disaster on the nuclear industry. “Today, we have a sense of energy about growth that we didn’t have before. Five years ago, the idea was if we stuck to our knitting, but stayed flat, that was success. I don’t think anyone here sees that as success anymore,” concluded Nick Liparulo, then head of the company’s engineering services.81
Resilient companies embody the Nietzschean adage that “what does not kill me makes me stronger” by being learning organizations. Each and every event, drill, near-miss, or scenario planning contingency expands the awareness of the company and adds to its repertoire of responses. For example, every year, Starbucks reviews the prior year’s events from around the globe and estimates the top risks for the coming year. This triggers a set of planning and preparation activities. Those abilities of sensing, reacting, and adapting help the company thrive in a complex, dynamic, global economy.
The rise in global competition means that, as Andy Grove, former Intel CEO said, “only the paranoid survive.”82 The Internet enables customers and consumers to find the winning product offering in virtually any product category, and global supply chains deliver those winning goods at unprecedented scale. Moreover, financial pressures on companies from shareholders and cost-cutting customers will continue to push companies toward lean, just-in-time operations rather than just-in-case redundancies such as extra inventories and spare capacity. Thus, a supply chain disruption can mean an existential threat to the unprepared.
At the same time, the world seems to be experiencing an accelerating rate of large-scale disruptions and unimagined “unknown unknowns.” The threat seems unlikely to abate given long-term trends such as the increase in the world’s population coupled with the burgeoning billions of consumers who are straining the earth’s resources.83,84 Furthermore, such pressures are also responsible for political upheavals, security concerns, and economic crises. The concentrated economic density created by the migration of people into crowded urban agglomerations also contributes to the rise in economic losses with each new natural disaster.
The rate of “creative destruction” is climbing. Perhaps one of the greatest and least appreciated threats to any company is the self-imposed peril of stagnation in a world obsessed with both the next big thing and annual cost reductions. Companies need to constantly seek growth, if only to replace products and business lines that have become obsolete in the face of global competition, technological advances, changing corporate social responsibility standards, or regulation.
Thriving—indeed even surviving in this environment of flocking black swans and creative destruction—will depend on resilience. Companies can even exploit risks, such that “disruptions” can bring an increase in their sales, market share, and profits. Disruptions can also create opportunities to implement significant changes (such as improving organizations and processes) that are not possible when there is no “burning platform.” An especially well-prepared and responsive company can be ready to supply what other less-prepared companies cannot. By being more resilient than competitors, better at preventing disruptions, more effective at mitigating impacts, and faster at managing scarce postdisruption supplies, a company can dominate its industry.
A company that can detect, prevent, or respond to natural, accidental, and intentional disruptions can make the most of its winning products by ensuring continuity of supply. Resilience helps companies compete—even in the face of true unknown-unknown disruptions—by imbuing an organization with the vigilance, responsiveness, and flexibility to detect and respond to unexpected events quickly and effectively. And resilience is more than just a way to bounce back. The activities that create resilience also improve collaboration, coordination, and communications in both directions of the supply chain, making it a strategy for bounding forward into a future rich with possibility.