Russian novelist Leo Tolstoy wrote, “Happy families are all alike; every unhappy family is unhappy in its own way.”1 This so-called Anna Karenina Principle applies to supply chain disruptions in that every disruption comes with its own litany of misery, its own roster of causes, and its own cascade of effects. No two disruptions follow identical scripts, but the management of risk and disruption does encompass three general activities: prevention, detection, and response. Those three activities can frame companies’ resilience efforts.
Before organizations can categorize various disruptions for purposes of resilience management, they need to examine the gallery of possibilities. Supply chains, with their complex global connections and diverse stakeholders, can have many failure modes. Disruptions might be tied to natural, negligent, or intentional causes. Disruptions might involve any of a combination of suppliers, workers, customers, competitors, the built environment, the natural world, governments, and nongovernmental organizations. The root-cause events may strike a company directly, or they may strike a deep-tier supplier or a customer’s customer.
Ash from a volcano in Iceland in 2010 grounded air traffic across the European Union and, consequently, decimated fresh food and flower exporters in Africa. A 2011 flood in Thailand inundated 877 factories,2 halted 30 percent of the global hard disk manufacturing industry,3 and caused billions of dollars in losses for the PC industry. A drought in the US Midwest in 2012 damaged crop yields and sent corn and soybean prices soaring. The price spike hit food producers, especially meat and dairy producers.4 Each year, nature supplies fresh insults to companies dependent on the smooth operations of their global supply chains. Supply chains and logistics are outdoor sports, and customers don’t want to hear that the game was canceled because of rain.
In total, natural disasters created $360 billion in losses in 2011.5 The year 2011 saw an especially severe litany of floods, hurricanes, earthquakes, and tsunamis. These events killed people, damaged property, disabled logistics infrastructure, and upended the lives of citizens and employees. Many natural disasters—such as the Thai floods and the Japanese quake and tsunami—affected large areas and entire industries. Annual surveys of businesses in 2009,6 2010,7 2011,8 2012,9 and 201310 found that 50 percent of companies suffer supply chain disruption from adverse weather in any given year. Weather is the first or second most common cause of disruption. In addition to disruptions from adverse weather, about 20 percent of companies experience a supply chain disruption from an earthquake or tsunami in any given year.
Although many natural disasters involve too much water, some involve not enough. The 2012 drought in the US Midwest affected transportation by reducing the level of the Mississippi River to the point of impairing the navigability of the river.11 River barges carry 60 percent of US grain exports and 20 percent of US coal12 as well as other bulk commodities such as steel, petroleum, fertilizers, and construction materials. The resulting two-month disruption in the winter of 2012–2013 resulted in an estimated $6 billion in losses.13 The Rhine River—a major route for bulk commodities in Europe—suffers from similar water-level problems in dry years.
Disruptive accidents, often caused by lax safety measures, run the gamut from massive conflagrations to simple failures in critical pieces of equipment: A German factory making cyclododecatriene exploded and car makers around the world suddenly realized they might face potential disruptions of thousands of different parts used on every vehicle they make.14 A barge on the Rhine River capsized, closing the river for twenty days, causing a stack-up of 450 barges and hindering the 170 million tonnes of goods shipped on the river annually.15 A paint supplier to a contract manufacturer of a toy maker had to find a second source for pigments but didn’t have time for testing. Later, the pigments were found to contain lead, causing a highly publicized recall of 1.5 million toys.16 After lithium-ion batteries in the aft compartment of a newly-introduced passenger jet caught fire at the end of a long flight, the Federal Aviation Administration grounded the entire worldwide fleet, the first such action by the FAA in 34 years; the grounding cost the manufacturer and airlines hundreds of millions of dollars.17,18 A garment factory in Bangladesh collapsed, 1,100 workers died, and many blamed prominent Western clothing companies for the deaths and deplorable conditions in factories in Bangladesh. Accidents and safety violations can disrupt logistics infrastructure, manufacturing equipment, and the flow of goods or parts, and they can undo many years of reputation-building and brand loyalty.
Whereas natural disasters occur regardless of the preparations and vigilance of companies, other types of disruptions become less likely for the well-prepared and the attentive. Safety programs, intensive quality control, and prudence can reduce the likelihood of accidents and violations. And yet, the connectivity of supply chains and companies’ dependence on shared resources such as key raw materials or key transportation lanes imply that even the most careful company can be disrupted by the imprudence and bad luck of others.
Intentional disruptions come in many forms. In November 2012, 400 office clerks walked off their jobs at the ports of Los Angeles and Long Beach, thereby halting the movement of $760 million a day worth of goods.19,20 In 2005, terrorists attacked the lightly guarded London subway and bus system rather than the more heavily secured Heathrow Airport. To protest the destruction of tropical forests for the farming of palm oil, Greenpeace raided Nestle’s annual shareholders meeting in 2010. Activists dressed as orangutans stood outside Nestle’s headquarters in Frankfurt, Germany, while other activists unfurled a banner inside the meeting itself.21
Intentional disruptions include attacks on a company’s assets or processes, with the goal of disrupting its operations or robbing it. Such disruptions include criminal acts like cyber-disruptions (e.g., denial-of-service attacks and theft of customer data), cargo theft, extortion, kidnapping, embezzlement, sabotage, and corporate espionage, as well as legal actions such as labor strikes, management lockouts, and activist boycotts and protests.
Intentional disruptions are fundamentally different from natural disruptions or accidents in two major aspects. First, the attacker will usually choose the most impactful place and time for the event—thus, for example, the port workers chose to strike the month before Christmas when the volumes are at their highest and capacity strained.22 Second, the disruption will be aimed at the least hardened target. Furthermore, whereas the likelihood of a hurricane or an earthquake is not influenced by protective measures (only the impacts are affected), hardening a target against an intentional disruption can lower the likelihood of such an attack. Furthermore, unlike, say, accident avoidance measures that primarily benefit the preparer, preparations against intentional disruption can increase the chance that the attacker will target a related, less protected target such as a different business unit or a competitor.
Beginning with Apple’s iPhone in 2007, the rise of touchscreen smartphones coupled with app stores decimated the sales of previous mobile phone industry leaders such as Nokia, Blackberry, and Motorola. The Toyota Production System, developed in the 1970s, resulted in American manufacturers not being able to compete on cost and quality, causing the US government to impose “voluntary” quotas on Japanese car imports from 1981 through 1994.23 In his groundbreaking book The Innovator’s Dilemma,24 Clayton Christensen gives many other examples of new products and business processes that disrupted existing ones, from the transistor radio to LCD TVs to steel mini-mills. Such innovations cause existing firms to cede their market leadership, lose profits, and even disappear. (See more on disruptive innovation in chapter 12).
Creative destruction may be disruptive, yet as the theory of evolution suggests, “survival of the fittest” is what keeps companies, industries, and economies competitive. Competition motivates innovations in products, services, costs, and consumer choices. As with the evolution of species, the more competitive and prone to failure individual players in an industry are, the more robust the industry as a whole is (see chapter 13).25
In addition, businesses also face illegal competition from counterfeiters. Copies of popular brands of clothes and shoes dominate the worldwide counterfeit trade, which had an estimated value of $600 billion in 2010, according to the US Immigration and Customs Enforcement Agency.26 Counterfeit competitors sell $75 billion in fake pharmaceuticals, which bring the added threat of harming those who take them.27 And, according to the BSA | The Software Alliance global piracy study, “42% of all PC software packages installed in the world in 2011 were pirated.”28 Although injured companies, governments, and international bodies have been working to fight this illegal trade, it has been growing with increasing globalization and e-commerce.29
Some threats, such as a competitor’s predatory pricing, are difficult to prove and involve prolonged legal proceedings or unpredictable political forces. In 1996, Microsoft started to give away its Internet Explorer browser, driving Netscape out of the market and sparking a multiyear antitrust lawsuit against Microsoft.30 Google faced complaints by European regulators that the free distribution of the Android operating system was predatory.31 Believing that Chinese tire manufacturers enjoyed subsidies as well as an artificially low currency value, the United States slapped a tariff of 35 percent on imported Chinese tires in 2009.32
In 1997, a crash in the value of the Thai currency created a financial contagion that swept through Asian economies33 and even caused crises in financial markets in the United States, Europe, Russia, and Latin America.34 Then, in 2008, a housing bubble led to a foreclosure crisis that threatened to collapse the world financial system like a house of cards. Marked contractions in credit supply and consumer demand triggered a global bullwhip as imports plummeted, causing contraction and bankruptcies throughout global supply chains.
Nor are financial contagions the only potential causes for a global crisis. In 2003, severe acute respiratory syndrome (SARS) appeared in Asia and then rapidly spread to more than two dozen countries.35 The unknown nature of the new disease and its accompanying high fatality rate led to quarantines and warnings about travel.36 Ten years after the SARS outbreak, health officials were carefully monitoring a related disease, Middle East Respiratory Syndrome (MERS).37 And in 2014, governments around the world were taking steps to stop the spread of the Ebola virus.38 Health officials also worry that each new strain of flu could threaten to reenact the 1918 Spanish Flu pandemic that killed 50 to 100 million people worldwide.39 In addition to the potential human toll, epidemic diseases threaten to curtail the free movement of people and goods that underpin global supply chains.
Last, there are internal and external political upheavals. A dispute between the governments of China and Japan over the ownership of a group of uninhabited islands led to a Chinese boycott of Japanese goods, resulting in a 17 percent drop in volume of Japanese exports to China between June and November 2012.40 Following a 2014 decision by China to move an oil rig to disputed waters with Vietnam, Vietnamese mobs ransacked foreign factories, causing manufacturers around the world to halt production.41 In 2011, Spanish fruit and vegetable exporters lost €200 million a week after a food poisoning scare caused Germany to ban Spanish cucumbers.42
The growing interconnectedness of the global economy makes it increasingly prone to contagion. Contagious events, including medical and financial problems, can spread via human networks that often correlate strongly with supply chain networks. Unlike the more localized disruptions of natural disasters, industrial accidents, or terrorist strikes, global crises deliver a near-simultaneous blow to multiple countries and multiple industries. Furthermore, the mere fear of contagion, especially with health and financial issues, can cause a reduction in demand because of caution as well as supply and price spike issues resulting from hoarding. Although everyone can get hit, the weaker and less-prepared companies suffer the most.
The preceding anecdotes and surveys of business disruptions illustrate two key points about such events, which affect how companies prioritize risk management efforts. First, different disruptions have different degrees of impact. For example, a tsunami that drags an entire factory into the sea is more serious than a shortage of some part. Second, different disruptions occur with different frequencies or likelihoods. Adverse weather occurs more frequently than do major fires, epidemics, or disruptive innovations.
Thus, many risk experts categorize potential disruptions by their impacts and their likelihoods, creating a 2 × 2 matrix as shown in figure 2.1. This stylized plot also shows where various hypothetical types of disruptions might lie on the four quadrants of impact and likelihood. The figure depicts events defined by causes (e.g., flood, wind damage, recession) as well as events defined by their effects on the supply chain (e.g., loss of key supplier, IT failure, and downed transportation link).
Companies can estimate the impacts and likelihoods of disruptions using a range of historical, analytic, or subjective measures. The potential impact can be estimated in terms of revenue loss, operating income reduction, brand diminution, stock price reduction, and/or loss of market share. The likelihood of many disruptive events can be estimated based on their past frequency and various probability models; that is how insurance companies assess risk and calculate premiums. Although the impact of a downed plant or a supplier’s inability to ship parts may be the same regardless of the cause, estimating likelihood entails examining the possible causes of the disruption and the chances of them being triggered.
In the absence of good data and rigorous estimates of impact and likelihood, however, companies use more subjective scoring methods. For example, a large beverage company uses a scheme that divides each axis into five levels, creating a 5 × 5 matrix (rather than the 2 × 2 shown in figure 2.1). Furthermore, the numerical values associated with the levels are not linear. The company assigns the five levels of impact (the horizontal axis) a numerical score of 1, 3, 7, 15, and 31, respectively, and the five levels of likelihood (the vertical axis) a numerical score of 1, 2, 4, 7, and 11, respectively. The rationale for this pattern of levels is that impacts (e.g., “What happens if Supplier X can’t ship for two months?”) are often easier to assess than likelihoods (“What is the probability that something will disrupt Supplier X?”) and therefore are given a higher weight. It also means that high-impact/low-probability events will have a higher risk score than high-probability/low impact events.
The company multiplies the impact and likelihood numerical scores to compute a total risk score, which can range from 1 (for insignificant risks with both low likelihood and low impact) to 341 (for perceived “worst-case” risks with both high likelihood and high impact). This number is, in fact, a mathematical expectation of the damage from a disruption, and the assumption is that the higher the expectation, the more resources should be directed toward mitigation and resilience. As mentioned later in this chapter (in the section titled “The Irony of Anxiety about Expected Losses”), however, the worst-case disruptions may not be the highest-expected-value disruptions.
In the average year, seismologists tally about 1,300 earthquakes of magnitude 5 to 5.9, which are strong earthquakes capable of causing damage. They also detect an average of 134 earthquakes of magnitude 6 to 6.9, which are quakes that have 32 times the destructive energy but are about 1/10th as likely to occur as the magnitude 5 to 5.9 quakes. Finally, seismologists record about 15 quakes of magnitude 7 to 7.9, which are another 32 times more energetic and approximately another 1/10th as likely.43 This pattern of increasing destructive magnitude and decreasing likelihood—in which each factor of multiplication of the seriousness of the event comes with a significant decrease in the likelihood of events (about 1/10th as many quakes for each 32 times increase in destruction)—is known as a power law distribution. The power law distribution is also known popularly as the 80/20 law or Pareto Rule, which, in the context of disruptions, posits that 80 percent of events will be frequent, minor events; and only a rare, small percentage will generate the major impact.
As it turns out, many types of disruptive events—including earthquakes, volcanoes, hurricanes, tornados, floods, landslides, forest fires, power outages, and even man-made events such as terrorist activities, cybercrimes, wars, and commodity price volatility—generally follow a power law. That is, they show a multiplicative inverse relationship between the likelihood and the impacts of events. Figure 2.244 presents, for example, the cumulative number of events for earthquakes, hurricanes and floods in the United States over a 90-year period versus the loss per event on a log-log scale. Figure 2.2 is similar to figure 2.1 in that it shows “likelihood” and impacts of disruptive events.
Yet figure 2.2 is different in four key ways. First, figure 2.2 shows a spectrum of events of a given type of disruption, such as a range of earthquakes of different impacts and different likelihoods. In contrast, figure 2.1 simply aggregates all quakes together as one average impact of average likelihood.
Second, the points on figure 2.2 are events that actually took place. Thus the plot shows the historical record, rather than an estimate of future likelihood and impact as in figure 2.1. Naturally, the historical records can be an input to the estimates of likelihood and impact. The plotted line can be used as an estimate for the future pattern of likelihoods and impacts, assuming that the future follows the same pattern as the past.
Third, figure 2.2 uses a log-log scale, which is a highly nonlinear, on both axes. Events in the highly likely upper half of the chart might be 10, 100, even 1,000 times more likely than those in the lower, unlikely half. And events in the high-impact right half might be 10, 100, even 1,000 times more destructive than those in the low-impact left half.
Fourth, figure 2.2 also reflects the total exposure of the entire United States rather than of a specific company. A given company with facilities and suppliers in only a few areas of the world would have lower likelihood of being hit by the natural disasters depicted in figure 2.2, but the impact on the company, if hit, may be significant. Thus, the slope of the log-log line may be shallower for a given company or facility.
Power law statistics also affect the expected losses for different levels of impacts. If 10 times bigger events are 1/10th as likely as the smaller events (a slope of –1.0 on the log-log plot), then the cumulative losses over a long time period from frequent, small disruptions will be as high as the total losses from rare, massive disruptions during the same time period. Yet, from a risk management standpoint, the high-likelihood/low-impact events do not require any significant response and, by and large, do not represent an existential threat. In contrast, high-impact disruptions—whether relatively likely or highly unlikely—are what risk management is all about.
Every single day, small events disrupt the smooth running of a business: late deliveries, digital communications disruptions, low yields, isolated workplace accidents, and similar small-scale events. Such small-scale disruptions may delay a shipment, alter customer commitment, or reduce productivity, but they don’t threaten the company. Even significant demand spikes, critical machine breakdowns, or input price fluctuations do not generally pose an existential threat to most well-run companies. Being both modest in effect and likely to occur, such disruptions are handled tactically through companies’ day-to-day operations that manage minor burbles in supply, demand, scheduling, and prices. Routine business processes are designed to smooth low-impact disruptions that barely merit any notice at all.
On February 15, 2013, a modest space rock about 65 feet wide hurtled at 43,000 miles per hour into the dawn skies of central Russia. Although the meteor did not directly hit anything, it did explode high in the atmosphere with a detonation equivalent to a 500-kiloton nuclear weapon. Even at a distance of 20 miles, the blast damaged 7,200 buildings and seriously injured 1,500 people around Chelyabinsk, Russia.45
Scientists who study meteors and the outer space environment near the earth worry about the very rare but inevitable likelihood of much larger rocks hitting the earth.46 A direct strike by a large “city killer” meteor could demolish a city or create a massive tsunami. Although strikes by larger meteors or asteroids might be extremely unlikely, the potential death toll and the resulting economic impact could be devastating.
Risk managers can fail to anticipate many other types of events besides meteor strikes. The 1996 Chernobyl nuclear accident caused a release of 400 times more radioactive materials than the Hiroshima atomic bomb, contaminating over 100,000 square kilometers with significant fallout.47 The 1984 Bhopal industrial disaster in a Union Carbide plant exposed 500,000 people to methyl isocyanate gas, causing thousands of deaths and severe injuries.48 The 2010 explosion of the BP Horizon oil rig in the Gulf of Mexico caused 11 deaths and the biggest oil spill in the history of the petroleum industry.49 These and other events—such as the 2010 Eyjafjallajökull volcanic eruption, the 2005 Hurricane Katrina, and the 2011–2013 Arab Spring, to name an additional few—were neither envisioned nor planned for, causing significant disruptions as a result.
This category also includes so-called black swans, which are events that were thought to be impossible or that have never been imagined until they occur. The term derives from the historical experience that every swan seen by Europeans was a white swan and, therefore, black swans were assumed to be nonexistent. But then black swans were found in Australia in 1697. The term was popularized by Nassim Taleb in 2007 to indicate a category of flawed reasoning about unprecedented events:50 the lack of evidence of a possible disruption does not constitute evidence of lack of possible disruption. As mentioned in chapter 1, Japanese planners underestimated the potential heights of tsunamis, such as the one that hit the Fukushima prefecture. The planners had centuries of earthquake and tsunami data but falsely concluded that a tsunami as high as the 2011 one was impossible and that therefore the nuclear reactors were safe from that threat. Black swans reflect a deeper kind of uncertainty than standard likelihood because experts misjudge the likelihood of a black swan risk to be zero when, in fact, it is not.
Each year, the Atlantic Basin brews up an average of 12 named storms, of which six become hurricanes.51 Thus, the 600 manned oil platforms in the Gulf of Mexico face a high chance of disruption every year. When a hurricane such as Isaac (2012) threatens the area, more than 90 percent of platforms prepare by shutting down production and evacuating personnel.52 Similarly, every year, officials in extreme northern latitudes prepare for severe winter storms by restocking road salt, maintaining plowing equipment, replenishing airport de-icing mixtures, and so forth. For some types of disruptions, the question is not “if” but “when” and “how severe.”
Anyone using statistical reasoning based on the expectation of losses would assess high-impact/high-likelihood events as the worst. They happen relatively often and hit hard. These are the types of events for which companies prepare, such as the oil platform operators preparing for seasonal hurricanes. These are the type of events that the methodology of taking the product of likelihood and impacts is designed to highlight. High expected losses occurring at relatively high likelihood justify proactive steps to reduce the likelihood of impacts. These events have enough salience that companies plan for them, prepare specific mitigation tools and processes to lessen their impact, and coordinate a planned response to these potential disruptions with their suppliers.
Of course, the term “high likelihood” is relative. As the power law indicates, the likelihood of specific high-impact events may still be very small. Yet chapter 1 noted that globalization has increased the length, breadth, and complexity of supply chains. Although small-likelihood events are individually unlikely, global enterprises are now exposed to large numbers of unlikely events through all their complex and lean networks of suppliers. In other words, the probability that a specific disruption will take place in a specific supplier’s facility on a specific day may be very small. However, the probability that something significant will happen somewhere in a global supply chain sometime during a given year is not negligible.
Impact and likelihood combine to affect the overall priority of each risk. As mentioned above, the standard logic of risk management prioritizes risk based on the expected value of the loss—which is impact multiplied by likelihood. In quadrant terms, high-impact/high-likelihood risks have the highest priorities, low-impact/low-likelihood risks have the lowest priorities, and both high-impact/low-likelihood and low-impact/high-likelihood risks have intermediate values. Expected value, however, is only partially effective in classifying risks. In particular, high-impact/low-likelihood risks may be more dangerous than their expected value implies because their rarity means that no one in the company will have had experience with the event, and the unlikelihood of the disruption makes it easy to ignore.
The contrast between more-likely versus less-likely disruptions illustrates a common pattern in how organizations think about and prioritize risk and uncertainty. Organizations plan for the expected (e.g., hurricane seasons) because its likelihood is historically high, and they pay less attention to the highly unlikely or the impossible (“black swans”) because even the nature of the disruption may not be foreseen. Yet if an organization has thorough and inclusive preparation and a ready mitigation plan for an event, that event should be reclassified as lower-impact because the company has likely tempered those impacts. That is, risk management itself modulates the risks to the company.
Taking the logic of the effects of risk mitigation efforts to the next level suggests a different view of the actual dangers of different types of disruptions. The most dangerous events are not the well-known high-impact/high-likelihood ones for which the organization has experience and well thought-out “playbooks.” Rather, the most dangerous events are the high-impact/low-likelihood ones. The reason is because such events are either unimaginable or are so rare that they have not taken place in recent memory, if ever, and as such are “not on the radar screen” of risk managers. And even if such events are imagined, they may be assessed as unlikely and therefore they don’t justify proactive steps like mitigation procedures or playbooks.
In a press briefing on February 12, 2002, US Secretary of Defense Donald Rumsfeld said, “There are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown unknowns—there are things we do not know we don’t know.”53
Risk managers can think about the analogs of these three categories of foreknowledge in a supply chain risk management context. The “known knowns” of disruptions are the everyday problems, referred to as the “daily dribble” earlier in this chapter. They also include seasonal variations and long-term trends, such as population aging, urbanization, and declining automobile ownership in the developed world.
The “known unknowns” are the foreseeable but random disruptive events whose probability can be estimated from historical evidence, power-law extrapolation, and logic. “Known unknown” events include tornados in Oklahoma and hurricanes in the Gulf of Mexico. Such disruptions may be significant, but they are not considered “outside the realm of possibility.” These are the high-likelihood/high-impact events that can be prepared for through playbooks, drills, and experience. They can also be insured against because their probability density is known and thus a quantitative risk measure can be calculated.
Finally, there are the “unknown unknowns”—those events for which not only can the likelihoods not be calculated, but the events themselves have not been imagined. Such events should be discussed in terms of uncertainty rather than risk. The scenario of a record-breaking tsunami wave hitting Japan—triggered by a 9.0 earthquake that causes a nuclear disaster and subsequent power shortages—was not imagined by any planner; nor was there any historical precedence for such an event. Similarly, the 9/11 terrorist attack caught the United States by surprise. Furthermore, few foresaw the growing real estate bubble in the United States prior to 2008 and fewer still took actions to mitigate the financial meltdown that followed. The near-collapse of the international financial system did not enter into the risk management calculus of most executives. In comparing these three categories, Rumsfeld concluded, “And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult one.”54
The statistics of big, rare events hide a curse. No matter how bad the last “big one” was, a bigger one is inevitable. As history rolls onward, the list of major disruptions grows skyward. The next “bigger one” may take a long time to materialize, or it could happen tomorrow; but, unfortunately, the unlikely is not the impossible. With a growing global population and a growing global economy, the biggest disaster will always lie somewhere in the future.
Yet, companies don’t prepare specifically for meteor strikes, calamitous accidents, or cataclysmic storms and other natural events. Such occurrences are too infrequent, especially within the scope of a single company and its supply chain. Preparation for unexpected events requires the development of general resilience—the ability and processes required to “bounce back” from whatever happens (see chapter 4 and chapter 6).
The menagerie of disruptions illustrates the boundless causes of disasters. From toxic labor relations to toxic lead contamination, from viruses to volcanoes, and from regulation to innovation, companies face innumerable threats to their ongoing operations. Every day, media outlets report on catastrophes near and far, making the world seem a dangerous place. And yet, Cisco found that fixating on each and every cause wasn’t the best way to think about business risks. Although the focus on causes did “scare the business” into funding risk management at Cisco, the cause-focused view didn’t lead to effective risk management.55 The overall rarity of each cause implies that next year’s causes are almost inevitably different from last year’s causes.
After the 2006 Taiwan earthquake, Cisco changed how it looks at risks.56 In contrast to worrying about the never-ending litany of new causes of disruption in a diverse and complex world, Cisco started looking at the effects side of the risk picture, especially the question of “what if we can’t make and deliver a given product,” regardless of cause. Unlike the causes, the effects are tractable and known because they are linked directly to the companies’ product portfolio and its global network of suppliers and contract manufacturers. Whereas Cisco can predict neither the cause of the next disaster nor its likelihood, it can consider the potential impacts of a disruption to each product in terms of interruption of product revenues. And the products’ risks do follow a power law—a relative few of Cisco’s products account for more than half its potential risk, simplifying risk prioritization efforts.57 The impact estimation method and the crisis management dashboard described in chapter 3 use this focus on products.
The effects-focused view reflects the fact that what is disrupted matters more than why it is disrupted. Frank Schaapveld, senior director supply chain EMEA (Europe, the Middle East and Africa) at Medtronic, the medical equipment and technology company, said, “We do take into account natural disasters or internal root causes like power outages, but I’m not really interested in the nature of the disaster, only its impact. Will a location be out of action for one hour, one day, or one week? How long will it be without critical personnel? What caused the impact is less relevant.”58
Despite the usefulness of the effects-focused view at stripping away the “noise” of the 24-hour news cycle, thinking about causes has its uses. The effects-focused approach is not associated with any disruption likelihood—it is just a “what if?” analysis. Yet some products rely on riskier technologies from riskier facilities and riskier suppliers in riskier geographies. The likelihood of a disruption is an important element in prioritizing the preparations for it. A second use of the cause-focused approach is in understanding correlated risks—the chance that two different effects (e.g., disruption of two suppliers or two products) might occur simultaneously and create greater damage or disrupt a back-up supplier. The cause-focused approach lets risk managers think about the scopes of different types of disruptions. For example, an industrial accident or fire may disrupt a single site of a given supplier; flooding or a regional political upheaval may disrupt multiple colocated sites; and bankruptcy or industrial action may disrupt all of the supplier’s sites.
In addition to the two dimensions of likelihood and impacts, disruptions vary on a third crucial dimension: detectability. Some types of disruptions can be forecasted or detected well before they have an impact on the enterprise, while others hit without warning. Detectability adds a time dimension to the classification of disruptions and is defined as the time between knowing that a disruptive event will take place and the first impact. Note that the detectability of an event can be positive (detection before the impact), zero (realization at the instant of occurrence), or even negative (detection after the disruption has taken place).
Figure 2.3 shows the addition of the detectability dimension to the two dimensions depicted in the quadrant diagram in figure 2.1. The detectability axis can be divided into four main segments: very long-term trends that are well discussed in the media, for which companies have time to prepare strategically; disruptions that arise and hit after some short warning (e.g., hurricanes); disruptions that strike with no warning but can be instantly recognized if they happen (e.g., a fire); and disruptions that are hidden and are only discovered some time after the fact (e.g., a product contamination or design defect) or not at all (e.g., industrial espionage).
Trends such as the aging of the world’s population in the Western world, China, and Japan, come as no surprise and can be detected years—even decades—in advance. Growing demand for energy and natural resources in China, India, and other emerging markets is also all but inevitable, with its concomitant implications for supplies and prices. Long-term trends in urbanization, mobile phone usage, economic growth in sub-Saharan Africa, robotics (including drones), rising food prices, and water scarcity will affect investments and supply chain patterns. The difference between long-term trends and any other risk is that these trends offer an opportunity to incorporate them into a company’s strategy, thereby profiting from them. Nonetheless, like the proverbial frog in gradually heating water, some companies may not detect, prepare for, or take advantage of slowly shifting trends.
The end dates of labor contracts are known on the signing date of the contract, yet some companies miss that signal of higher risk of labor strikes at that time. The phase-in dates of major regulatory changes (e.g., regarding toxic chemicals) can likewise have months or years of lead time. Most supplier bankruptcies should be of little surprise (see chapter 5). Companies can detect which suppliers have precarious balance sheets, unfavorable patterns of profits, or negative cash flows months before the supplier reaches a point of insolvency or bankruptcy. Furthermore, shipment errors, quality problems, and slow refunds can presage a troubled supplier even before this shows in the financial data.
Other threats have shorter detection horizons but still offer some chance of a warning. As capricious as the weather seems, the physics of moist air flowing at various altitudes and at various temperatures isn’t impenetrably mysterious. Hurricanes, many floods, and winter storms now arrive with hours or days of warning. When a hurricane steams into the Gulf of Mexico, the oil rig workers know what to do. They are trained to carry out a series of “shut-in” procedures for closing key valves and securing equipment, thereby reducing the chances of an oil spill when the storm hits and enabling a rapid restart after the storm passes. Forewarning lets a company kick-off impact-avoidance and recovery efforts.
Even earthquakes can be detected as they start, enabling early warnings to those more distant from the epicenter. Businesses and residents of Tokyo knew the 2011 quake was coming about 80 seconds before it struck and had up to 40 minutes warning on the tsunami’s arrival in Tokyo Bay. Data can flow faster than disaster (see chapter 8).
On the morning of December 8, 2010, the electricity supply dropped for just 0.07 of a second at Toshiba’s Yokkaichi memory-chip plant in Mie prefecture. The power glitch caused the factory’s equipment to reboot, which ruined all the wafers in production. That was all it took to create a two-month disruption in production of NAND59 flash memory.60 At the time, Toshiba provided 35.4 percent of the world’s supply of NAND flash,61 and the failure affected about 20 percent of Toshiba’s production. “I don’t think it could come at a worse time,” said Krishna Shankar, an analyst at ThinkEquity, because of the surging demand for NAND flash in fast-growing product categories such as smartphones, tablets, digital cameras, and music players.62 No one could have forecast the event. Toshiba had no warning of the disruption, but it did know in an instant that disruption had struck.
Some events strike with little or no warning, like a technology outage, an explosion in a factory, or a terrorist attack. One minute, everything is running smoothly and, the next second, chaos erupts. What happens afterward depends on the likelihood of the event. High-impact/high-likelihood disruptions that strike suddenly trigger a playbook response based on experience and drills. High-impact/low-likelihood events are more of a surprise and require significant information gathering, assessment, and creative problem solving.
In both cases, the detection time includes the time required for the company to sufficiently understand what happened and to mount an appropriate response. During 9/11, for example, the first plane to hit the World Trade Center was presumed to be an accident. Only after the second plane hit and after enough of the US government’s agencies realized what was happening, was the military able to launch jets to intercept United Flight 93, which crashed in western Pennsylvania before the military could intercept it.63 A large disruption can affect an entire industry, in which case the company that identifies the nature and magnitude of the problem early on can minimize the impact by securing supply, transportation, and access before its competitors do.
In early 2007, a long-time paint supplier to Mattel ran short of colorants for the paint and could not get more from its primary supplier. The supplier quickly found a backup supplier via the Internet, who assured that it could supply safe colorants that were certified as lead-free. The paint supplier didn’t test the new colorant because testing would delay production, although paint workers noted that the new paint did not smell the same as the usual formulation.64
For two and half months, Mattel’s contract manufacturer made and shipped some one million toys painted with the substitute colorants. The toys flowed from China across the seas to distributors and retailers. In early July 2007, testing by a European retailer revealed prohibited levels of lead in the paint and coatings on some Mattel toys. Mattel immediately halted production of the toys, investigated the cause, and confirmed the presence of lead in the paint. In early August of 2007, Mattel recalled nearly one million toys of 83 different types.
Fortunately, the impact was not as bad as it could have been because two-thirds of the contaminated toys were still in the distribution chain. Yet Mattel still needed to alert consumers to return some 300,000 sold toys made during the two-and-a-half month period before it realized the problem.65 Subsequent testing found other lead-contaminated toys, which forced Mattel to recall another million toys that autumn.66 Mattel also paid a $2.3 million fine for violating federal bans on lead paint.67 The recall meant that Mattel had to incur significant costs, including the logistics of identifying, collecting, and destroying the toxic products. More important, the incident tainted the brand in the eyes of consumers and the media. As a result, Mattel’s stock fell 25 percent off its 2007 high during the worst of the recall incident.68
Whereas everyone knows when an earthquake hits, some disasters have a hidden start. Food contamination incidents can take weeks to surface as a result of delays in the food reaching consumers, the incubation time of the food-borne pathogens, and the time required to trace the illness back to particular types and brands of food. Usually, the greater the delay in detecting a hidden problem, the greater the impact and the resulting damage. Product defects—caused by design errors or material quality issues—may not surface until long after the goods are in customers’ hands and in use.
Some disruptions have a more insidious and less detectable character—they are unknown unknowns. For example, toys have contained magnets for decades without safety concerns, but a new breed of high-strength magnets created an unforeseen and serious safety problem. Unlike prior generations of magnets, if these types of high-strength magnets broke loose from the toy and if a child swallowed more than one magnet over time, then the magnets could potentially pinch together two parts of the child’s intestines, cause a perforation, and lead to a serious infection. Toy makers sold tens of millions of toys with these types of magnets over a nine-year period before health and safety officials detected the problem and mandated an extensive recall.69
Companies have many strategic options for managing the diverse risks that they face. Risk management may include prevention of avoidable risks, playbooks to respond to common types of disruptions, general resilience for unexpected or very rare disruptions, and improving awareness of both incipient risks and ongoing disruptions.
The quadrant framework of likelihood and impact suggests two complementary approaches for reducing risks. First, a company can reduce the likelihood of disruptions by being compliant with regulations, adhering to social concerns, maintaining good labor relations, and avoiding situations that are prone to disruption (e.g., suppliers in floodplain locations or unstable countries). A company may also implement safety, quality, and security measures,70 including cybersecurity, to prevent possible intentional attacks. Yet, prevention and likelihood-reducing measures cannot entirely eliminate risks. Furthermore, prevention generally targets the foreseeable causes of disruptions, which implies that it may not reduce the likelihood of unknown-unknown risks.
Second, companies can reduce the impact by preparing a timely and effective response to disruptions. Optional assets such as spare inventory, spare capacity, and alternative suppliers provide materials and resources that can be utilized to minimize impacts and accelerate recovery times. Flexible processes can help a company respond quickly and efficiently. To this end, companies can create emergency operations centers, business continuity plans, and predefined escalation procedures that help coordinate a response (see chapter 6). Increasing flexibility and adding “just-in-case” assets can provide general resilience that helps address unknown-unknown threats.
An important part of reducing the impacts of a disruption is quick detection (see chapter 8). The earlier the warning, the more a company can do in preparation, such as moving inventory and assets away from the affected area, preparing recovery materials, or securing second-source supplies. Detection also means perceiving the scope and magnitude of the disruption. Accelerating a company’s information flow and its decision-making processes is an important factor in detection and fast response.
Temperature and smoke sensors can warn of a fire, and many industrial sites connect these sensors to automatic fire suppression, fire evacuation alarms, and emergency responders. Similarly, tsunami sensors around the Pacific Ocean not only detect incoming tsunamis but also automatically activate sirens and evacuation alerts.
In a similar fashion, consumer-facing companies use social media to detect problems with their products and even to mitigate developing problems. For example, Dell and Best Buy use social media to both monitor problems and communicate with affected customers, thus responding quickly in order to avoid a growing wave of bad publicity. Some companies go beyond “listening” to social media discussions by inserting “solutions” (e.g., “I heard that the company knows about the problem and a new keyboard will be shipping in two months…”).
On June 7, 2013, Delta Airlines was jolted to discover an unfavorable YouTube video of soldiers returning from Afghanistan who were complaining about being charged extra for a fourth checked bag. The video went viral.71 Delta immediately understood the looming public relations disaster. Later the same day, it issued a corporate apology and by the next morning it changed its policy to allow soldiers traveling on orders to check four bags for free. The policy change meant that software systems had to be updated, airport kiosks modified, and employees around the world notified. By noon of that next day, Delta updated its blog posts and Facebook page alerting the public to the changed policy.72 The fast action prevented the video from gaining the notoriety of the video “United Breaks Guitars.”73,74
Detecting trends and longer-term disruptions means monitoring the environment for changes. Detecting instant-impact events means monitoring operations, suppliers, and the regions in which they operate. Detecting hidden-impact events means intensive monitoring of less visible elements of the supply chain, such as deeper-tier suppliers, fringe groups that influence an industry, and unusual adverse events among customers that might signal a heretofore unknown problem in the company’s products or processes.
Detection is a broader and deeper strategy than just installing smoke alarms or social media monitoring; rather, detection means vigilance toward both specific near-term events and potential future events that might disrupt the company. Detection depends on creating visibility into the supply chain; at its heart, detection is the conversion of the unknown into the known in a timely fashion. Via detection, a company might go from: 1) not even knowing it is exposed to disruption from Gulf Coast hurricanes because of a deep-tier supplier of critical raw material being in that region to 2) knowing it’s exposed but only having an estimated likelihood and impact based on actuarial data to 3) knowing that the supplier will be hit by a hurricane in three days’ time but that this supplier has four weeks of inventory at an inland distribution center and a business continuity plan for a five-week recovery time creating a potential one-week gap in supplies a month after the hurricane’s landfall. Thus, detection converts unknown-unknowns into known-unknowns and converts known-unknowns into known-knowns.