How Do You Infect Your Organization With Humane Ops?

A presentation at DevOpsDays Kansas City 2018 in October 2018 in Kansas City, MO, USA by Matt Stratton

Slide 1

Slide 1

@mattstratton Matty Stratton DevOps Evangelist, PagerDuty WITH HUMANE OPS HOW TO INFECT YOUR ORGANIZATION

Slide 2

Slide 2

@mattstratton @mattstratton

Slide 3

Slide 3

@mattstratton

Slide 4

Slide 4

@mattstratton

Slide 5

Slide 5

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 6

Slide 6

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 7

Slide 7

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours ▸ 82 million notifications during evening hours PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 8

Slide 8

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours ▸ 82 million notifications during evening hours ▸ 250 million notifications during sleeping hours PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 9

Slide 9

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours ▸ 82 million notifications during evening hours ▸ 250 million notifications during sleeping hours ▸ 122 million notifications on weekends PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 10

Slide 10

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours ▸ 82 million notifications during evening hours ▸ 250 million notifications during sleeping hours ▸ 122 million notifications on weekends ▸ A total of 750,000 nights with sleep-interrupting notifications PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 11

Slide 11

@mattstratton THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS ▸ 60 million notifications during dinner hours ▸ 82 million notifications during evening hours ▸ 250 million notifications during sleeping hours ▸ 122 million notifications on weekends ▸ A total of 750,000 nights with sleep-interrupting notifications ▸ A total of 330,000 weekend days with interrupt notifications PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Slide 12

Slide 12

@mattstratton LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE

Slide 13

Slide 13

@mattstratton LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted

Slide 14

Slide 14

@mattstratton LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted ▸ Number of days when a responder is woken overnight

Slide 15

Slide 15

@mattstratton LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted ▸ Number of days when a responder is woken overnight ▸ Number of weekend days interrupted by notifications.

Slide 16

Slide 16

@mattstratton EXAMPLES OF MEMES ARE TUNES, IDEAS, CATCH-PHRASES, CLOTHES FASHIONS, WAYS OF MAKING POTS OR OF BUILDING ARCHES. JUST AS GENES PROPAGATE THEMSELVES IN THE GENE POOL BY LEAPING FROM BODY TO BODY, SO MEMES PROPAGATE THEMSELVES IN THE MEME POOL BY LEAPING FROM BRAIN TO BRAIN VIA IMITATION. Richard Dawkins @mattstratton

Slide 17

Slide 17

@mattstratton SNOW CRASH Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

Slide 18

Slide 18

@mattstratton SNOW CRASH ▸ In the book, “Snow Crash” itself is a neural- linguistic virus.   Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

Slide 19

Slide 19

@mattstratton SNOW CRASH ▸ In the book, “Snow Crash” itself is a neural- linguistic virus.   ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

Slide 20

Slide 20

@mattstratton SNOW CRASH ▸ In the book, “Snow Crash” itself is a neural- linguistic virus.   ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme ▸ Plus, lots of swordplay Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

Slide 21

Slide 21

@mattstratton SNOW CRASH ▸ In the book, “Snow Crash” itself is a neural- linguistic virus.   ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme ▸ Plus, lots of swordplay “IDEOLOGY IS A VIRUS.” 


  • NEAL STEPHENSON Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

Slide 22

Slide 22

@mattstratton WHAT IF YOU ARE THE SUPREME LEADER?

Slide 23

Slide 23

@mattstratton WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work

Slide 24

Slide 24

@mattstratton WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil

Slide 25

Slide 25

@mattstratton WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil ▸ Avoid “executive swoop”

Slide 26

Slide 26

@mattstratton WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil ▸ Avoid “executive swoop”

Slide 27

Slide 27

@mattstratton MIDDLE MANAGEMENT TIPS

Slide 28

Slide 28

@mattstratton MIDDLE MANAGEMENT TIPS ▸ Encourage safe post-incident review spaces

Slide 29

Slide 29

@mattstratton MIDDLE MANAGEMENT TIPS ▸ Encourage safe post-incident review spaces ▸ Drive for a culture of learning

Slide 30

Slide 30

@mattstratton MIDDLE MANAGEMENT TIPS ▸ Encourage safe post-incident review spaces ▸ Drive for a culture of learning ▸ You hired smart people - use them

Slide 31

Slide 31

@mattstratton REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING http://bit.ly/2KpzKKW If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 32

Slide 32

@mattstratton REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.”   http://bit.ly/2KpzKKW If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 33

Slide 33

@mattstratton REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.”   ▸ Don’t take my word for it. Ask Ron Westrum. http://bit.ly/2KpzKKW If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 34

Slide 34

@mattstratton REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.”   ▸ Don’t take my word for it. Ask Ron Westrum. ▸ You can also ask Dr. Nicole Forsgren - @nicolefv http://bit.ly/2KpzKKW If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 35

Slide 35

@mattstratton USE THE FORCE, EVEN IF YOU AREN’T A JEDI

Slide 36

Slide 36

@mattstratton REVIEW ALL THE THINGS @mattstratton Andy Fleener, Platform Operations Manager, Sportsengine - “We review every alert from the last 24 hours/weekend every day. No broken windows.”

Slide 37

Slide 37

@mattstratton REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE http://bit.ly/2Ihj1wV If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 38

Slide 38

@mattstratton REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE ▸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. http://bit.ly/2Ihj1wV If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 39

Slide 39

@mattstratton REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE ▸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. ▸ This happened to NASA. Twice. http://bit.ly/2Ihj1wV If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 40

Slide 40

@mattstratton REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE ▸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. ▸ This happened to NASA. Twice. ▸ In our case, we start to accept alerts or degradations as acceptable. http://bit.ly/2Ihj1wV If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the   Normalization of Deviance   effect. In this case, we start to accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Slide 41

Slide 41

@mattstratton QUESTION METRICS @mattstratton Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that   if your page load time increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will   increase   by 50 percent. Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Slide 42

Slide 42

@mattstratton QUESTION METRICS WHY ARE WE USING THESE NUMBERS? Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that   if your page load time increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will   increase   by 50 percent. Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Slide 43

Slide 43

@mattstratton QUESTION METRICS WHY ARE WE USING THESE NUMBERS? ▸ What is the data that drive your incident process Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that   if your page load time increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will   increase   by 50 percent. Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Slide 44

Slide 44

@mattstratton QUESTION METRICS WHY ARE WE USING THESE NUMBERS? ▸ What is the data that drive your incident process ▸ Are your metrics tied to business outcomes? Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that   if your page load time increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will   increase   by 50 percent. Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Slide 45

Slide 45

@mattstratton QUESTION METRICS WHY ARE WE USING THESE NUMBERS? ▸ What is the data that drive your incident process ▸ Are your metrics tied to business outcomes? ▸ Correlation doesn’t always equal causation Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that   if your page load time increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will   increase   by 50 percent. Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Slide 46

Slide 46

@mattstratton SIMPLE. ALWAYS. @mattstratton Don’t over-design systems. Resume-driven development is almost always a recipe for on-call disasters.

Slide 47

Slide 47

@mattstratton KEEP IT SIMPLE At the heart of every complex resilient system is the hubris that someone believed they could predict everything that could go wrong. Fate, and the internet, laughs

Slide 48

Slide 48

@mattstratton THE MORE RESILIENTLY THE SYSTEM IS DESIGNED, THE MORE LIKELY IT IS TO CAUSE A NEGATIVE BUSINESS IMPACT KEEP IT SIMPLE At the heart of every complex resilient system is the hubris that someone believed they could predict everything that could go wrong. Fate, and the internet, laughs

Slide 49

Slide 49

@mattstratton THE MORE RESILIENTLY THE SYSTEM IS DESIGNED, THE MORE LIKELY IT IS TO CAUSE A NEGATIVE BUSINESS IMPACT Stratton’s Law of Catastrophic Predestination KEEP IT SIMPLE At the heart of every complex resilient system is the hubris that someone believed they could predict everything that could go wrong. Fate, and the internet, laughs

Slide 50

Slide 50

@mattstratton COMMUNICATE. TALK TO PEOPLE ask how the on call is feeling during stand ups. give them the opportunity to mention they might be burning out.

Slide 51

Slide 51

@mattstratton COMMUNICATE. TALK TO PEOPLE ▸ Who are your customers? What are their expectations? ask how the on call is feeling during stand ups. give them the opportunity to mention they might be burning out.

Slide 52

Slide 52

@mattstratton COMMUNICATE. TALK TO PEOPLE ▸ Who are your customers? What are their expectations? ▸ Whose customer are you? Can you help them out? ask how the on call is feeling during stand ups. give them the opportunity to mention they might be burning out.

Slide 53

Slide 53

@mattstratton COMMUNICATE. TALK TO PEOPLE ▸ Who are your customers? What are their expectations? ▸ Whose customer are you? Can you help them out? ▸ What are the perceptions of your team? ask how the on call is feeling during stand ups. give them the opportunity to mention they might be burning out.

Slide 54

Slide 54

@mattstratton HUMANS, PEOPLE ARE

Slide 55

Slide 55

@mattstratton HUMANS, PEOPLE ARE ▸ Consider contextual on-call

Slide 56

Slide 56

@mattstratton HUMANS, PEOPLE ARE ▸ Consider contextual on-call ▸ The Golden Rule

Slide 57

Slide 57

@mattstratton HUMANS, PEOPLE ARE ▸ Consider contextual on-call ▸ The Golden Rule ▸ Bake cookies

Slide 58

Slide 58

@mattstratton HUMANS, PEOPLE ARE ▸ Consider contextual on-call ▸ The Golden Rule ▸ Bake cookies

Slide 59

Slide 59

@mattstratton INCIDENT COMMAND LEARN TO TAKE COMMAND volunteer to help as an incident commander (what’s that? Maybe we should have them!) 


Slide 60

Slide 60

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 61

Slide 61

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 62

Slide 62

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 63

Slide 63

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect ▸ Rally fast, disband faster You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 64

Slide 64

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect ▸ Rally fast, disband faster ▸ Don’t litigate severity You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 65

Slide 65

@mattstratton MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect ▸ Rally fast, disband faster ▸ Don’t litigate severity ▸ Have a clear mechanism for making decisions You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as soon as possible.

Slide 66

Slide 66

@mattstratton SHARE ALL TESTS SHARING IS CARING

Slide 67

Slide 67

@mattstratton SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH

Slide 68

Slide 68

@mattstratton SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH ▸ All functional tests used in preproduction should have a corresponding monitor in production

Slide 69

Slide 69

@mattstratton SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH ▸ All functional tests used in preproduction should have a corresponding monitor in production ▸ All monitoring functionality in production should have corresponding tests in the build/release process

Slide 70

Slide 70

@mattstratton SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH ▸ All functional tests used in preproduction should have a corresponding monitor in production ▸ All monitoring functionality in production should have corresponding tests in the build/release process ▸ Monitoring is testing with at time dimension. There should be full parity between preproduction and production.

Slide 71

Slide 71

@mattstratton DO ONE NICE THING EVERY SPRINT

Slide 72

Slide 72

@mattstratton HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT Even if it’s not on a card

Slide 73

Slide 73

@mattstratton HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT ▸ In each sprint/work unit, add value to your responders Even if it’s not on a card

Slide 74

Slide 74

@mattstratton HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT ▸ In each sprint/work unit, add value to your responders ▸ Even if it’s not on a card Even if it’s not on a card

Slide 75

Slide 75

@mattstratton HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT ▸ In each sprint/work unit, add value to your responders ▸ Even if it’s not on a card ▸ You rebel, you. Even if it’s not on a card

Slide 76

Slide 76

@mattstratton ADDING VALUE SOME EXAMPLES These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Slide 77

Slide 77

@mattstratton ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Slide 78

Slide 78

@mattstratton ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Slide 79

Slide 79

@mattstratton ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. ▸ Add some (useful) tests These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Slide 80

Slide 80

@mattstratton ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. ▸ Add some (useful) tests ▸ Remove something unused These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Slide 81

Slide 81

@mattstratton ADDING VALUE

Slide 82

Slide 82

@mattstratton ADDING VALUE ▸ If you use feature flags, add a description field to the configuration

Slide 83

Slide 83

@mattstratton ADDING VALUE ▸ If you use feature flags, add a description field to the configuration ▸ If you use runbooks, ensure they are up to date every time you cut a release. If you don’t do this, abandon the runbook altogether (an incorrect runbook is considered harmful)

Slide 84

Slide 84

@mattstratton ADDING VALUE ▸ If you use feature flags, add a description field to the configuration ▸ If you use runbooks, ensure they are up to date every time you cut a release. If you don’t do this, abandon the runbook altogether (an incorrect runbook is considered harmful) ▸ SIMPLIFY, MAN!

Slide 85

Slide 85

@mattstratton SHARE YOUR ON-CALL STORIES WITH ME LATER @MATTSTRATTON LINKEDIN.COM/IN/MATTSTRATTON

MATTSTRATTON.COM

ARRESTEDDEVOPS.COM

Slide 86

Slide 86

@mattstratton NOTI.ST/MATTSTRATTON

Slide 87

Slide 87

@mattstratton FURTHER READING AND REFERENCES ▸ Improving Your Employee Retention With Real-Time Ops Data - http://bit.ly/ 2rGTnq4

▸ Page It Forward! - http://bit.ly/2In8Lzc

▸ The study of information flow: A personal journey - http://bit.ly/2KpzKKW

▸ The Normalization of Deviance (If It Can Happen to NASA, It Can Happen to You) - http://bit.ly/2Ihj1wV

Slide 88

Slide 88

@mattstratton ▸ Snow Crash by Neal Stephenson - http://bit.ly/2Iiuc8L

▸ The Cybersecurity Canon: Snow Crash

http://bit.ly/2InDYGI

▸ Disasters! Arrested DevOps Episode 37 - https://arresteddevops.com/37

▸ PagerDuty Incident Response - http://response.pagerduty.com