HOW TO INFECT YOUR ORGANIZATION WITH HUMANE OPS Matty Stratton DevOps Advocate, PagerDuty @mattstratton

@mattstratton

@mattstratton

@mattstratton

@mattstratton

@mattstratton

@mattstratton

🔥📟 @mattstratton

@mattstratton

@mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours â–¸ 82 million notifications during evening hours @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours â–¸ 82 million notifications during evening hours â–¸ 250 million notifications during sleeping hours @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours â–¸ 82 million notifications during evening hours â–¸ 250 million notifications during sleeping hours â–¸ 122 million notifications on weekends @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours â–¸ 82 million notifications during evening hours â–¸ 250 million notifications during sleeping hours â–¸ 122 million notifications on weekends â–¸ A total of 750,000 nights with sleep-interrupting notifications @mattstratton

THE DATA 50,000 RESPONDERS RECEIVING A TOTAL OF 760 MILLION NOTIFICATIONS â–¸ 60 million notifications during dinner hours â–¸ 82 million notifications during evening hours â–¸ 250 million notifications during sleeping hours â–¸ 122 million notifications on weekends â–¸ A total of 750,000 nights with sleep-interrupting notifications â–¸ A total of 330,000 weekend days with interrupt notifications @mattstratton

LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE @mattstratton

LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted @mattstratton

LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted ▸ Number of days when a responder is woken overnight @mattstratton

LET’S HAVE SOME DATA THE MOST MEANINGFUL METRICS ON ATTRITION ARE ▸ Number of days where a responder’s work and life are interrupted ▸ Number of days when a responder is woken overnight ▸ Number of weekend days interrupted by notifications. @mattstratton

@mattstratton

EXAMPLES OF MEMES ARE TUNES, IDEAS, CATCH-PHRASES, CLOTHES FASHIONS, WAYS OF MAKING POTS OR OF BUILDING ARCHES. JUST AS GENES PROPAGATE THEMSELVES IN THE GENE POOL BY LEAPING FROM BODY TO BODY, SO MEMES PROPAGATE THEMSELVES IN THE MEME POOL BY LEAPING FROM BRAIN TO BRAIN VIA IMITATION. @mattstratton Richard Dawkins @mattstratton

SNOW CRASH @mattstratton

SNOW CRASH ▸ In the book, “Snow Crash” itself is a neurallinguistic virus. @mattstratton

SNOW CRASH ▸ In the book, “Snow Crash” itself is a neurallinguistic virus. ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme @mattstratton

SNOW CRASH ▸ In the book, “Snow Crash” itself is a neurallinguistic virus. ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme ▸ Plus, lots of swordplay @mattstratton

SNOW CRASH ▸ In the book, “Snow Crash” itself is a neurallinguistic virus. ▸ The bad guys figure out how to unlock it, and it spreads from hacker to hacker like a meme ▸ Plus, lots of swordplay “IDEOLOGY IS A VIRUS.” - NEAL STEPHENSON @mattstratton

WHAT IF YOU ARE THE SUPREME LEADER? @mattstratton

WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work @mattstratton

WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil @mattstratton

WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil ▸ Avoid “executive swoop” @mattstratton

WHAT IF YOU ARE THE SUPREME LEADER? ▸ “Command and control” doesn’t work ▸ Use measurement for good, not for evil ▸ Avoid “executive swoop” @mattstratton

MIDDLE MANAGEMENT TIPS @mattstratton

MIDDLE MANAGEMENT TIPS â–¸ Encourage safe post-incident review spaces @mattstratton

MIDDLE MANAGEMENT TIPS â–¸ Encourage safe post-incident review spaces â–¸ Drive for a culture of learning @mattstratton

MIDDLE MANAGEMENT TIPS â–¸ Encourage safe post-incident review spaces â–¸ Drive for a culture of learning â–¸ Take care of your people @mattstratton

REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING @mattstratton

REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.” @mattstratton

REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.” ▸ Don’t take my word for it. Ask Ron Westrum. @mattstratton

REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.” ▸ Don’t take my word for it. Ask Ron Westrum. ▸ You can also ask Dr. Nicole Forsgren - @nicolefv @mattstratton

REVIEW. REVIEW. REVIEW A CULTURE OF LEARNING ▸ In a generative, performance-oriented organization, “failure leads to inquiry.” ▸ Don’t take my word for it. Ask Ron Westrum. ▸ You can also ask Dr. Nicole Forsgren - @nicolefv http://bit.ly/2KpzKKW @mattstratton

USE THE FORCE, EVEN IF YOU AREN’T A JEDI @mattstratton

REVIEW ALL THE THINGS @mattstratton

HAND-OFF TIME THE ON-CALL REVIEW @mattstratton

HAND-OFF TIME THE ON-CALL REVIEW â–¸ Primary purpose is to understand on-call load and pain @mattstratton

HAND-OFF TIME THE ON-CALL REVIEW ▸ Primary purpose is to understand on-call load and pain ▸ Approximately a week’s worth of on-call history is common @mattstratton

HAND-OFF TIME THE ON-CALL REVIEW ▸ Primary purpose is to understand on-call load and pain ▸ Approximately a week’s worth of on-call history is common ▸ Take about 30 minutes, give or take @mattstratton

ON-CALL REVIEW, CONTINUED @mattstratton

ON-CALL REVIEW, CONTINUED â–¸ Typically instituted by a team manager @mattstratton

ON-CALL REVIEW, CONTINUED â–¸ Typically instituted by a team manager â–¸ Usually run by on-call responders @mattstratton

ON-CALL REVIEW, CONTINUED â–¸ Typically instituted by a team manager â–¸ Usually run by on-call responders â–¸ Minimum attendees are the team manager, outgoing on-call, and incoming oncall @mattstratton

ON-CALL REVIEW, CONTINUED â–¸ Typically instituted by a team manager â–¸ Usually run by on-call responders â–¸ Minimum attendees are the team manager, outgoing on-call, and incoming oncall â–¸ BETTER PRACTICE - include the entire team! @mattstratton

REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE @mattstratton

REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE â–¸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. @mattstratton

REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE â–¸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. â–¸ This happened to NASA. Twice. @mattstratton

REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE â–¸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. â–¸ This happened to NASA. Twice. â–¸ In our case, we start to accept alerts or degradations as acceptable. @mattstratton

REVIEW. REVIEW. REVIEW NORMALIZATION OF DEVIANCE â–¸ The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization. â–¸ This happened to NASA. Twice. â–¸ In our case, we start to accept alerts or degradations as acceptable. http://bit.ly/2Ihj1wV @mattstratton

QUESTION METRICS @mattstratton

QUESTION METRICS WHY ARE WE USING THESE NUMBERS? @mattstratton

QUESTION METRICS WHY ARE WE USING THESE NUMBERS? â–¸ What is the data that drive your incident process @mattstratton

QUESTION METRICS WHY ARE WE USING THESE NUMBERS? â–¸ What is the data that drive your incident process â–¸ Are your metrics tied to business outcomes? @mattstratton

QUESTION METRICS WHY ARE WE USING THESE NUMBERS? ▸ What is the data that drive your incident process ▸ Are your metrics tied to business outcomes? ▸ Correlation doesn’t always equal causation @mattstratton

SIMPLE. ALWAYS. @mattstratton

KEEP IT SIMPLE @mattstratton

KEEP IT SIMPLE THE MORE RESILIENTLY THE SYSTEM IS DESIGNED, THE MORE LIKELY IT IS TO CAUSE A NEGATIVE BUSINESS IMPACT @mattstratton

KEEP IT SIMPLE THE MORE RESILIENTLY THE SYSTEM IS DESIGNED, THE MORE LIKELY IT IS TO CAUSE A NEGATIVE BUSINESS IMPACT Stratton’s Law of Catastrophic Predestination @mattstratton

COMMUNICATE. TALK TO PEOPLE @mattstratton

COMMUNICATE. TALK TO PEOPLE â–¸ Who are your customers? What are their expectations? @mattstratton

COMMUNICATE. TALK TO PEOPLE â–¸ Who are your customers? What are their expectations? â–¸ Whose customer are you? Can you help them out? @mattstratton

COMMUNICATE. TALK TO PEOPLE â–¸ Who are your customers? What are their expectations? â–¸ Whose customer are you? Can you help them out? â–¸ What are the perceptions of your team? @mattstratton

HUMANS, PEOPLE ARE @mattstratton

HUMANS, PEOPLE ARE â–¸ Consider contextual on-call @mattstratton

HUMANS, PEOPLE ARE â–¸ Consider contextual on-call â–¸ The Golden Rule @mattstratton

HUMANS, PEOPLE ARE â–¸ Consider contextual on-call â–¸ The Golden Rule â–¸ Bake cookies @mattstratton

HUMANS, PEOPLE ARE â–¸ Consider contextual on-call â–¸ The Golden Rule â–¸ Bake cookies @mattstratton

LEARN TO TAKE COMMAND INCIDENT COMMAND @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL â–¸ Have clearly defined roles @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL â–¸ Have clearly defined roles â–¸ Avoid bystander effect @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL â–¸ Have clearly defined roles â–¸ Avoid bystander effect â–¸ Rally fast, disband faster @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect ▸ Rally fast, disband faster ▸ Don’t litigate severity @mattstratton

MAKE IT NICE ON THE BRIDGE DURING A CALL ▸ Have clearly defined roles ▸ Avoid bystander effect ▸ Rally fast, disband faster ▸ Don’t litigate severity ▸ Have a clear mechanism for making decisions @mattstratton

SHARING IS CARING SHARE ALL TESTS @mattstratton

SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH @mattstratton

SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH â–¸ All functional tests used in preproduction should have a corresponding monitor in production @mattstratton

SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH â–¸ All functional tests used in preproduction should have a corresponding monitor in production â–¸ All monitoring functionality in production should have corresponding tests in the build/release process @mattstratton

SHARE ALL TESTS TESTS ARE FOR SWE AND SRE BOTH â–¸ All functional tests used in preproduction should have a corresponding monitor in production â–¸ All monitoring functionality in production should have corresponding tests in the build/release process â–¸ Monitoring is testing with at time dimension. There should be full parity between preproduction and production. @mattstratton

EVERY SPRINT DO ONE NICE THING @mattstratton

HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT @mattstratton

HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT â–¸ In each sprint/work unit, add value to your responders @mattstratton

HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT ▸ In each sprint/work unit, add value to your responders ▸ Even if it’s not on a card @mattstratton

HELP YOUR RESPONDERS IN EACH AND EVERY SPRINT ▸ In each sprint/work unit, add value to your responders ▸ Even if it’s not on a card ▸ You rebel, you. @mattstratton

ADDING VALUE SOME EXAMPLES @mattstratton

ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) @mattstratton

ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. @mattstratton

ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. ▸ Add some (useful) tests @mattstratton

ADDING VALUE SOME EXAMPLES ▸ Provide better context in logging (stacktraces alone don’t count) ▸ Remove some technical debt. Yes, you have some. ▸ Add some (useful) tests ▸ Remove something unused @mattstratton

ADDING VALUE @mattstratton

ADDING VALUE â–¸ If you use feature flags, add a description field to the configuration @mattstratton

ADDING VALUE ▸ If you use feature flags, add a description field to the configuration ▸ If you use runbooks, ensure they are up to date every time you cut a release. If you don’t do this, abandon the runbook altogether (an incorrect runbook is considered harmful) @mattstratton

ADDING VALUE ▸ If you use feature flags, add a description field to the configuration ▸ If you use runbooks, ensure they are up to date every time you cut a release. If you don’t do this, abandon the runbook altogether (an incorrect runbook is considered harmful) ▸ SIMPLIFY, MAN! @mattstratton

@MATTSTRATTON LINKEDIN.COM/IN/MATTSTRATTON MATTSTRATTON.COM ARRESTEDDEVOPS.COM SHARE YOUR ON-CALL STORIES WITH ME LATER @mattstratton

SPEAKING.MATTSTRATTON.COM @mattstratton

FURTHER READING AND REFERENCES â–¸ Improving Your Employee Retention With Real-Time Ops Data - http://bit.ly/ 2rGTnq4 â–¸ Page It Forward! - http://bit.ly/2In8Lzc â–¸ The study of information flow: A personal journey - http://bit.ly/2KpzKKW â–¸ The Normalization of Deviance (If It Can Happen to NASA, It Can Happen to You) - http://bit.ly/2Ihj1wV @mattstratton

â–¸ Snow Crash by Neal Stephenson - http://bit.ly/2Iiuc8L â–¸ The Cybersecurity Canon: Snow Crash - http://bit.ly/2InDYGI â–¸ Disasters! Arrested DevOps Episode 37 - https://arresteddevops.com/37 â–¸ PagerDuty Incident Response - https://response.pagerduty.com â–¸ Operational Reviews - https://reviews.pagerduty.com @mattstratton