A presentation at DevOps Minneapolis Meetup in in Minneapolis, MN, USA by Matt Stratton
The Lifecycle of a Service Matt Stratton DevOps Advocate & Thought Validator, PagerDuty @mattstratton
Service Ownership means people take responsibility for what they deliver, at every stage of a service’s lifetime. @mattstratton
Communicate across your organization with partners and stakeholders @mattstratton
What is a service? @mattstratton
A service can be a lot of things Microservice Slice of a monolith Internal tool Piece of functionality Component Shared infrastructure Feature @mattstratton
A service can be a lot of things If it provides value to other people, it’s a service @mattstratton
Define what a “service” means to you @mattstratton
A service is a discrete piece of functionality that provides value that is wholly owned by a team @mattstratton
Shared understanding @mattstratton
Who is responsible? @mattstratton
“Service mitosis” @mattstratton
Service definitions help with problem resolution @mattstratton
What about a monolith? @mattstratton
Roles in service ownership @mattstratton
Development Team @mattstratton
Your service should make sense to other people who will interact with it @mattstratton
Naming @mattstratton
Be specific @mattstratton
Names that are specific • “User authenticator” • “Payment processor” • “Shopping cart” • “Login” • “Report generator” • “Email tracking code” @mattstratton
Less amazing names • PacMan (unless you’re actually building PAC-MAN, which I doubt) • Apollo • BurgunDB • Artemis @mattstratton
Descriptions @mattstratton
• What is the intent of this service, component, this slice of functionality? • How does this thing deliver value? • What does it contribute to? • How will this impact customers? @mattstratton
Dependencies • Look for circular dependencies • Is there a single point of failure? • Who consumes this service? • What does it depend on? @mattstratton
API • Versioning • Clear documentation / examples @mattstratton
Tiers of services @mattstratton
Tier 1 Services at PagerDuty • 24/7 on-call • Multiple levels of robustness • Disaster recovery plan • Clear and updated runbook @mattstratton
Tier 2 & 3 Services at PagerDuty • Monday-Friday support expectation • Supporting functionality, not critical path • New services that are not generally available @mattstratton
Sustainability team @mattstratton
Runbooks @mattstratton
Alerting @mattstratton
Robustness and reliability @mattstratton
Program management @mattstratton
Responsibilities of program management • Defining what ‘done’ is • Emotional awareness of stress of the rest of the team • Connective tissue work between different teams and features (help understand and mitigate dependencies) • Awareness of what it means to pull people away from other projects to solve a problem @mattstratton
Product owner @mattstratton
Customers are always asking for uptime, performance, and security – they just don’t usually use those words @mattstratton
Management @mattstratton
• Make room in the roadmap for investing in tech debt • Encourage a culture of cooperation and sharing • Set goals that balance business priorities with achievable engineering goals @mattstratton
Going deeper @mattstratton
What are you observing about this service? @mattstratton
Observability vs monitoring @mattstratton
Baron Schwartz Founder and CTO, VividCortex Monitoring tells you whether the system works. Observability lets you ask why it’s not working. @mattstratton
Empathy-driven alerting @mattstratton
A brief overview of SLA/SLO/SLI @mattstratton
Service Level Indicators (SLI) • Latency • Throughput • Availability @mattstratton
Service Level Objectives • Made up of SLI’s • Measured over time • Not contractually set @mattstratton
Service Level Agreements • Composed of SLO’s • Contractually/legally binding • Basically, this is where you owe your customer money @mattstratton
The “hadness” point @mattstratton
Alert on SLO’s @mattstratton
How does a team respond to this service? @mattstratton
Escalation policies @mattstratton
DevOps Model @mattstratton
First level @mattstratton
Second level @mattstratton
Third level @mattstratton
Escalating @mattstratton
Manual escalations @mattstratton
Other escalation models • Central Ops • Hybrid Ops @mattstratton
Tuning your service @mattstratton
Investigate patterns @mattstratton
What alerts do you actually need? @mattstratton
Suppression of non-actionable alerts @mattstratton
Understand business impact @mattstratton
Lifecycle steps @mattstratton
Designing a new service @mattstratton
• Understand the customers (product is a key role here) • Load testing / staging • Ensure SRE / sustainability teams are involved early • Define SLI/SLO/SLA • Identify alerting requirements • Documentation (API, runbook, functional service registry if applicable) • Perform all security checks @mattstratton
Maintaining and iterating @mattstratton
• Version the service API • Communicate to consumers • Proactive maintenance • Address tech debt consistently • Testing and deploying/releasing the service (CI/CD, testing in prod, etc) @mattstratton
Retiring a service @mattstratton
• Identify consumers • Determine business impact of retiring • Communicate / offboard consumers @mattstratton
Service ownership includes communication, compromise, and commitment. @mattstratton
Acknowledgements Lilia Gutnik - @superlilia Julian Dunn - @julian_dunn Charity Majors - @mipsytipsy Baron Schwartz - @xaprb Images provided by @mattstratton
If you enjoyed this talk, here’s more about me arresteddevops.com devopsdayschi.org twitter.com/mattstratton speaking.mattstratton.com @mattstratton
Services are the backbone of our systems. They are the pieces that make up our businesses—whether they are literal microservices or functional components of a traditional application, we can’t do the computer thing without services.
When it comes to a service in your organization, who’s responsible for it? The cast of characters involved in the lifecycle of a service is more than just software engineers. It can include program managers, product owners, sustainability teams (SREs/operations engineers), and business stakeholders, to name a few. We’ll talk about managing a service in production and understanding how that process affects your organization.
Here’s what was said about this presentation on social media.
Naming things matters (that, and cache invalidation)! @mattstratton #devopsmsp pic.twitter.com/ySRf0AN13U
— Bridget Kromhout (@bridgetkromhout) January 15, 2020
“You actually want both of these things, and these things are not the same.” @mattstratton #devopsmsp pic.twitter.com/0JYJ4s4h5o
— Bridget Kromhout (@bridgetkromhout) January 15, 2020
Services aren't just microservices!@mattstratton
— Jenn (@geekgalgroks) January 15, 2020
[Gif: Minon saying "WHAT?!!"] pic.twitter.com/mSbfotzbbJ
Naming!
— Jenn (@geekgalgroks) January 15, 2020
Don't use inside jokes. Explaining it over and over gets boring. Also can be problematic.@mattstratton
“Address tech debt consistently.” Yes please! @mattstratton on the lifecycle of a service #devopsmsp
— Jenna Pederson (@jennapederson) January 15, 2020
I missed this question but the answer is:
— Jenn (@geekgalgroks) January 15, 2020
Invent a time a machine and document that shit.@mattstratton
The "hadness" point
— Jenn (@geekgalgroks) January 15, 2020
The point where customers go from happy to sad. This is where the SLO should be set.
Alert on SLO not SLA.@mattstratton
[Gif: Man in a hat saying, "Alert! Alert!"] pic.twitter.com/NS3SJTRqzn