The Lifecycle of a Service

A presentation at DevOps Minneapolis Meetup in January 2020 in Minneapolis, MN, USA by Matty Stratton

Service Ownership means people take responsibility for what they deliver, at every stage of a service’s lifetime. @mattstratton

Communicate across your organization with partners and stakeholders @mattstratton

A service can be a lot of things Microservice Slice of a monolith Internal tool Piece of functionality Component Shared infrastructure Feature @mattstratton

A service can be a lot of things If it provides value to other people, it’s a service @mattstratton

Define what a “service” means to you @mattstratton

A service is a discrete piece of functionality that provides value that is wholly owned by a team @mattstratton

Service definitions help with problem resolution @mattstratton

Roles in service ownership @mattstratton

Your service should make sense to other people who will interact with it @mattstratton

Names that are specific • “User authenticator” • “Payment processor” • “Shopping cart” • “Login” • “Report generator” • “Email tracking code” @mattstratton

Less amazing names • PacMan (unless you’re actually building PAC-MAN, which I doubt) • Apollo • BurgunDB • Artemis @mattstratton

• What is the intent of this service, component, this slice of functionality? • How does this thing deliver value? • What does it contribute to? • How will this impact customers? @mattstratton

Dependencies • Look for circular dependencies • Is there a single point of failure? • Who consumes this service? • What does it depend on? @mattstratton

API • Versioning • Clear documentation / examples @mattstratton

Tier 1 Services at PagerDuty • 24/7 on-call • Multiple levels of robustness • Disaster recovery plan • Clear and updated runbook @mattstratton

Tier 2 & 3 Services at PagerDuty • Monday-Friday support expectation • Supporting functionality, not critical path • New services that are not generally available @mattstratton

Robustness and reliability @mattstratton

Responsibilities of program management • Defining what ‘done’ is • Emotional awareness of stress of the rest of the team • Connective tissue work between different teams and features (help understand and mitigate dependencies) • Awareness of what it means to pull people away from other projects to solve a problem @mattstratton

Customers are always asking for uptime, performance, and security – they just don’t usually use those words @mattstratton

• Make room in the roadmap for investing in tech debt • Encourage a culture of cooperation and sharing • Set goals that balance business priorities with achievable engineering goals @mattstratton

What are you observing about this service? @mattstratton

Observability vs monitoring @mattstratton

Baron Schwartz Founder and CTO, VividCortex Monitoring tells you whether the system works. Observability lets you ask why it’s not working. @mattstratton

A brief overview of SLA/SLO/SLI @mattstratton

Service Level Indicators (SLI) • Latency • Throughput • Availability @mattstratton

Service Level Objectives • Made up of SLI’s • Measured over time • Not contractually set @mattstratton

Service Level Agreements • Composed of SLO’s • Contractually/legally binding • Basically, this is where you owe your customer money @mattstratton

How does a team respond to this service? @mattstratton

Other escalation models • Central Ops • Hybrid Ops @mattstratton

What alerts do you actually need? @mattstratton

Suppression of non-actionable alerts @mattstratton

Understand business impact @mattstratton

• Understand the customers (product is a key role here) • Load testing / staging • Ensure SRE / sustainability teams are involved early • Define SLI/SLO/SLA • Identify alerting requirements • Documentation (API, runbook, functional service registry if applicable) • Perform all security checks @mattstratton

• Version the service API • Communicate to consumers • Proactive maintenance • Address tech debt consistently • Testing and deploying/releasing the service (CI/CD, testing in prod, etc) @mattstratton

• Identify consumers • Determine business impact of retiring • Communicate / offboard consumers @mattstratton

Service ownership includes communication, compromise, and commitment. @mattstratton

Acknowledgements Lilia Gutnik - @superlilia Julian Dunn - @julian_dunn Charity Majors - @mipsytipsy Baron Schwartz - @xaprb Images provided by @mattstratton

If you enjoyed this talk, here’s more about me arresteddevops.com devopsdayschi.org twitter.com/mattstratton speaking.mattstratton.com @mattstratton

Matty Stratton
@mattstratton

1 / 73

Services are the backbone of our systems. They are the pieces that make up our businesses—whether they are literal microservices or functional components of a traditional application, we can’t do the computer thing without services.

When it comes to a service in your organization, who’s responsible for it? The cast of characters involved in the lifecycle of a service is more than just software engineers. It can include program managers, product owners, sustainability teams (SREs/operations engineers), and business stakeholders, to name a few. We’ll talk about managing a service in production and understanding how that process affects your organization.

Video

Buzz and feedback

Here’s what was said about this presentation on social media.

Naming things matters (that, and cache invalidation)! @mattstratton #devopsmsp pic.twitter.com/ySRf0AN13U
— Bridget Kromhout (@bridgetkromhout) January 15, 2020
“You actually want both of these things, and these things are not the same.” @mattstratton #devopsmsp pic.twitter.com/0JYJ4s4h5o
— Bridget Kromhout (@bridgetkromhout) January 15, 2020
Services aren't just microservices!@mattstratton

[Gif: Minon saying "WHAT?!!"] pic.twitter.com/mSbfotzbbJ
— Jenn (@geekgalgroks) January 15, 2020
Naming!

Don't use inside jokes. Explaining it over and over gets boring. Also can be problematic.@mattstratton
— Jenn (@geekgalgroks) January 15, 2020
“Address tech debt consistently.” Yes please! @mattstratton on the lifecycle of a service #devopsmsp
— Jenna Pederson (@jennapederson) January 15, 2020
I missed this question but the answer is:

Invent a time a machine and document that shit.@mattstratton
— Jenn (@geekgalgroks) January 15, 2020
The "hadness" point

The point where customers go from happy to sad. This is where the SLO should be set.

Alert on SLO not SLA.@mattstratton

[Gif: Man in a hat saying, "Alert! Alert!"] pic.twitter.com/NS3SJTRqzn
— Jenn (@geekgalgroks) January 15, 2020

The Lifecycle of a Service

Link for this presentation:

HTML code for embedding:

Share on social media:

Video

Buzz and feedback