The talk
Zero Trust is for Networks, Not Your Teams
Delivered 3 times · 2023










































The whole idea of DevOps was about how we could work together better. But we broke down silos, and instead built new walls. The concept of zero trust has been widely applied to network security, however, it’s not a great way to think about our teams. This talk will explore how to foster a culture of trust in organizations, with a focus on outcomes. Leaders and individuals alike play a critical role in establishing and maintaining trust, which is crucial for the success of any team.
One key aspect of building a culture of trust is facilitating and growing psychological safety within the team. This means creating an environment where individuals feel comfortable expressing their opinions, ideas, and concerns without fear of negative consequences. Moreover, trust is necessary for proper practices of Site Reliability Engineering (SRE) and DevOps, but often organizations lack the right setup to allow for it.
This talk will feature practices inspired from the field of Resilience Engineering as well as proven DevOps approaches, with a focus on how leaders and individuals can create an environment where trust is valued, encouraged, and fostered. Attendees will take away insights and actionable tips to bring back to their teams to create a more resilient and effective organization.
Every delivery (3)
Resources
Transcript · 7,209 words · ~36 min read
Lightly edited for readability from the video’s captions. Download as text
So I bring my own walk-on music. We're going to talk a little bit about hero trust and teams, and we kind of will get our things started off by the way — that was my... I'm from Chicago. I'm from the Chicagoland area, I should say. How many people here are from the Chicago area? How many people understood why it was important that I didn't say I was from Chicago but from the Chicagoland area?
I was born in St. Louis — this is not part of my talk — but I was born in St. Louis. I've lived here since I was four. I grew up in a suburb called Villa Park, which spelled backwards is "crap alive." I currently live out in Lisle, known for the Arboretum, but I lived on the North Side for 20-plus years. Anyway, that song — there's going to be themes connected to music. That was Urge Overkill, a fantastic '90s Chicago band. But that's about the extent of the talk that has to do with Urge Overkill.
So we think about DevOps. Who thinks — what's the hardest thing about DevOps when we're doing DevOps? What's the hardest thing here? Probably like the auto-scaling, you know, whatever. No, it's people. So when we look at DevOps, it's been around for a while — before lunch, if you were in this room, someone took responsibility for stealing DevOps, and it's fairly true. We've been doing this for a bit of time.
So when we think about what DevOps was for, a lot of the things when we talk about DevOps, we talk about breaking down silos. How many people have kind of heard that in the context of DevOps — silos are bad, okay, let's break them down or whatever? But we also kind of made some new ones. We created DevOps teams. I'm not here to praise DevOps teams but to bury them, or maybe it's the other way around. That said, I'm not here to say that if you're on a team called the DevOps team or you're a DevOps engineer that that's a bad thing. We're here to get work done. But one of the things that happened was we said okay, we're going to work together, we're going to cross-create, and we just sort of kind of shifted things around.
So hey, I'm Matty. A couple things to know about me: I love punk music and '90s alternative, if that wasn't clear from my walk-on music. I also love improv. I studied improv here in Chicago 20-plus years at Players Workshop, Improv Olympic, been an improv coach, a bunch of other stuff. Also when it comes to improv, I'm a great example of "those who can do, those who can't teach." I love dogs. I have two great Australian Shepherds. I've decided — my son has told me that my other kids talk about what they want to do when they grow up and they're going to have families and everything, and my one son is like, "I'm going to become a vet and I'm going to buy a house in the middle of nowhere and adopt like 10 dogs." I said, "I'm going to come and live with you." And I work at a company called Ivana. So we're not going to talk about all four of those things; we're actually going to kind of focus in this as a little bit of metaphors and a framework. If you don't like either of those things, it's fine — we're still going to talk about technology and high-performing teams.
So David Woods talks about this idea and he says resilience is a verb. One fun thing that happened — not many fun things happened during the lockdown in 2020 — but one thing that was interesting: I'm a developer advocate, I give a lot of conference talks, I was doing a lot of those at home, same time that my kids were e-learning in the other room. So my kids got to hear me give talks quite a bit. And one day after giving a talk, my son, who was I think 11 at the time, came up and he goes, "You know, Daddy, I was listening to your talk and you said 'resilience is a verb.'" Actually, it's a noun. And then he was assigned seven different PDFs to read and he has never been a reply guy to me since.
But the idea — and we're going to be talking about ideas that come from the field of resilience engineering, which is not a new idea, it's not a new thing. It's a practice that's been around for five or six decades, commonly practiced in areas like nuclear safety, medicine, air traffic control, and also now tech, where we decided we invented it six years ago, as we do.
So in this idea of thinking about resilience as a verb, these are things that we do — they're things that we do to express resilience that exists in our system. A lot of times people talk about resilience and they'll say — especially vendors — they'll say, "Hey, if you use our tool we'll make your network resilient, we'll make your Kubernetes cluster resilient, we'll make your database resilient." And that's a bunch of BS, because you know what you can't make resilient is a purely technical system. We're going to talk about what resilience means and why that matters, because resilience is expressed by the people. This is going to become clear.
So I'm going to be a little pedantic, but trust me, we're going somewhere. Resilience is kind of made up of these four factors: rebound, robustness, graceful extensibility, and sustained adaptability. We're going to kind of talk about what these things mean.
Rebound first of all is returning to normal after a surprise. This is usually expressed through work that's done ahead of time. This is how our system goes back to normal after something happened that wasn't supposed to happen. We had an incident, we had a surprise — how do we bring it back? Think about a rubber band. That's the rebound. That's an important way that resilience is measured, or that we can express it or understand it: our ability to rebound.
When we think about robustness, this is usually what people mean when they say they're making a system resilient. Robustness is the ability to withstand and absorb well-modeled disturbances — things that we know can happen, maybe known knowns. The reality is we have a lot of unknown unknowns.
I love that I can make local references and I don't have to explain them. How many people are local-ish? Cool. I used to work for apartments.com. How many people remember apartments.com and Classified Ventures? Anybody worked at Classified Ventures? Cars? Okay. I carry a little bit of baggage about cars.com because apartments.com was always the sort of bitter young sibling of cars.com, but it's okay, we were still awesome.
Anyway, I remember — not a nonzero number of times — we would have some type of an incident, some type of an outage, and my boss the CTO would come to me and say, "Matt, why weren't you monitoring for that?" And I'd say, "Pat, because until last night I didn't know what could happen." That's the idea of robustness: the things we know can happen, our ability to withstand those. That's really important, and then reliability comes out of that.
Now, graceful extensibility is our ability to stretch beyond those operational boundaries, as opposed to brittleness. If we're only thinking about our robustness, we can only withstand things we know. Challenges to those operational abilities — building outside of that envelope — is our graceful extensibility. This requires people. I don't care what you think about AI or wherever that goes; this requires creativity and intuition and all the things that humans can do, because we didn't predict it.
And then this culminates in what we call sustained adaptability, which is our goal for resilience: the ability to manage these adaptive capacities over long time scales. If you've seen Courtney, who's giving a talk tomorrow, rant about MTTR — lots of us like to talk about the problem with mean time to resolution: it's all incidents in isolation. We think about the long time scales. We're in the business of longevity here. So we're trying to enhance and express our sustained adaptability.
So why are we talking so much about resilience and all these specific pieces? It's because the systems we're building are sociotechnical systems. If you've been sitting in this room for most of the day, this isn't at least the third time you've heard "sociotechnical" brought up, which was kind of great. Andrew gave a good definition of it which I'm not going to repeat. But really what this boils down to is all of the systems that we create and manage and operate and utilize are made up of technology, and they are made up of people — the people who built them, the people who run them, the people who use them — and they're constantly changing.
We think about exposure to users. This goes back to what Charity was talking about earlier, and the idea that until you get something in front of users it's not valuable, because we can think of all the things we want, but our users are going to find ways to do things that never occurred to us. So it really comes down to how the people come into things.
In the resilience engineering community, when we think about the people involved in systems that we're working with, we sometimes refer to the blunt end and the sharp end. This is not a value statement. It's not necessarily that blunt is negative — I said it's not a value statement, but if you look at the pictures I chose to put up here, I am making a value statement. That's Malcolm McLaren, who was the instigator of the Sex Pistols, the manager, and people might have opinions. And that's Kim Gordon, who is amazing. But at the blunt end, these are people in the organization that generally tend to be in a higher-level leadership position. The point is they're removed from the day-to-day work; they're upstream decision makers. That's not a bad thing — we have to have that.
But when we look at the sharp end in a system, the people at the sharp end are the people directly engaged in the work. If you've heard the term "chop wood, carry water" — these are, generally speaking, individual contributors. You can still be a manager and be at the sharp end, but we're closer to where the work is happening. So if we want to take this away from talking about tech, you might say the administrator of a hospital is the blunt end, the surgeon is at the sharp end. And the same thing applies to our SREs, our developers — we work at the sharp end. You need both parts of these systems, but they come in at different places.
The thing that's important about the sharp end to keep in mind: people at the sharp end, especially within technology — these are people who are constantly building and destroying systems. Our systems are in a constant state of flux. If we're at the sharp end, we're constantly building and destroying these systems. And at the sharp end there is really strong signaling amongst other members of that part. We talk to each other in very high fidelity. Not only is it signaling between us, it's signaling between us and the system. The closer we are to the system, the better we understand it, or at least the better signal we have toward it. People at the sharp end know how to improve systems based on strain because we have those signals; we're close there.
The thing that matters here is the sharp end will do all these things naturally when given ownership. A great example of this happened during the necessity of the shift to remote work that happened very, very quickly in the 17-week-long month of March 2020. It happened so fast that there wasn't time for large organizations to say, "This is how we are going to work remotely as a team, these are all of our good practices." The teams just had to do it, and they did. They figured out the right way to work in their team. And then we screwed it all up, learned no lessons from any of that, and went back to trying to tell people how they have to do work. So that's great.
Side note: I will do a little bit of explaining some of these references. This guy is up on the screen in this particular picture because there's a punk music festival here in Chicago every year called Riot Fest. If you haven't seen the lineup for this year, it's an absolute banger — go check it out. Two years ago I was there and I was sitting down eating a corn dog with my friend, and someone ran up and asked if he could take a selfie with me. I was like, okay, I mean, I don't know what just happened. I get that sometimes in an airport with people who like my podcast, which is also weird. And he runs off, and my friend Cat sitting next to me is laughing. She goes, "You know what just happened?" And I go, "No, what happened?" She goes, "He thought you were Fat Mike from NOFX." Well, that's Fat Mike — we don't look really anything alike. But also, last year at Riot Fest someone else thought I was Fat Mike, and NOFX wasn't even playing. So this year I'm going to get a T-shirt that says "I'm Not Fat Mike."
Anyway, command and control is a fallacy. Command and control is this idea that we pass orders down from the top. I'm not a big fan of using military metaphors for things, but this is so common in organizational culture that we're going to extend from it. Command and control is like: up here at the top we're going to dictate all the way down how things happen. And if you ask people who are fans of it, they'll say, "Well, this is how militaries work that are very effective." And I'm like, there's not a military on this planet that's followed command and control in the last 200 years, because it doesn't work. What modern militaries do is something called maneuver warfare, because the sharp end knows what's up. They have the situational awareness; the general sitting back doesn't see all this stuff. So we give outcomes. But yet within our organizations we still want to say, at the executive level, we're going to drive the how, as opposed to the why. And I'm going to talk about how we can fix that in our organizations.
Conway's Law, if you're not familiar, is the posit that your systems architecture — or your company's software architecture — is a reflection of your organization's communication patterns. Sometimes people say it's a reflection of your org chart, which is kind of saying the same thing. So if you have high degrees of communication, you're going to have systems that trust each other a lot more. If you've got a pathological communication organization, that's going to be reflected in how you design software.
When we think about Conway's Law, I like to think about flipping it around: how can we use models of software design to think about how we want an organization to work? Now, as I go into this next analogy, bear in mind: I spent two-plus decades working in operations. I am not a software engineer. I'm going to talk about software design. If you are someone who primarily has a background in software, I'm probably going to get some of it a little bit wrong, but this is creative license.
So I like to think about how we design services in a service-oriented architecture as a metaphor for how our team can work. This is not necessarily a microservices perspective — I know that a lot of SOA is out of date, and I'm not going to get into a SOAP versus REST argument. Just think about service architecture and how that can be echoed into how teams can work together.
In a service-oriented architecture, we have this idea of a service contract. We have a service that has its logic, and then I am a consumer of that service. There's a service out there where I can go do an order lookup, let's say, or I can get map GEOs or something like that. I don't care how that works; it's abstracted away from me. But the reason that can work is because there's a well-defined service contract that tells me as a consumer: if you send a message this way, you will get back this. So we have an agreement — an agreement that the inputs will look like this and the outputs will look like that. And as long as that contract is maintained on this side, I can do whatever the hell I want, and it doesn't matter. I don't have to test against it because the endpoints stayed the same. You have versioned endpoints. Again, all metaphors fall apart eventually. But the same thing — I can screw around over here all I want as long as I'm doing the things that I want to.
So when we think about this from a blunt-end/sharp-end perspective: inside a squad of some kind — whatever the size of a team is where they have an amount of responsibility — they're at the sharp end, they know the best way to accomplish their outputs. And we run into this challenge where we want to create "this is the way we do everything in this organization." Now, there are places where yes, we need to do things consistently, especially when it comes to procurement and stuff. Sometimes it doesn't make sense for every single development team in the organization to use a completely different source control system. One of the things I've basically found — because I get to ask this question a lot — the place when it makes sense or is necessary to have a common implementation of a tool is where there's connection. If this tool is something that is interconnected between different squads, then yes, we have to come together on it. But if it's inside your domain and it doesn't leak out of there, for the most part who cares? Which particular part of a React framework you choose to use? Who gives a crap. The continuous delivery pipeline software? Yeah, we probably need to be consistent on that because squads come together. So again, this is going back to where that trust can come. But it comes with — you don't get it for free. In order to have that trust, you have to have an agreement. I'm not talking about writing up some SLA between each other, but it's just sort of saying: these are the inputs, these are the outputs.
Again, probably at least the third or fourth time today you're hearing about psychological safety and the DORA State of DevOps report and all this other kind of stuff. So I'm not going to dig into it too much. But one of the things — I don't even put the year of the State of DevOps report that I put this up on because it's been in almost every one — is that the culture of psychological safety is what's been predictive of software delivery performance, organizational performance, and productivity.
This is reinforced — even though this happened before the DORA report — years ago, Google ran an internal analysis project. They said, "We want to figure out what makes our high-performing teams high-performing." This was called Project Aristotle. They did all this research: what do these teams have in common? The high-performing teams — was it their level of expertise in the industry? Was it them sitting next to each other, being co-located? What they found as the unifying theme was a high level of psychological safety.
And what exactly is psychological safety? It's this, according to Amy Edmondson from Harvard Business School: it's a sense of confidence that the team will not embarrass, reject, or punish someone for speaking up. That's it. That's all there is to it. It's saying that if I speak up — and it doesn't even mean argue — it's just: if I say something, do I feel confident that I'm not going to get in trouble, I'm not going to be made fun of, I'm going to be heard? That is the number one most important thing for a team to have for high performance.
We did an exercise within my team — my boss and his direct reports — last week, where we sat down in sort of an "all about me" format, like how do you work with me, blah blah blah. And one of the slides was about high-performing teams: what do you think is, for you as the individual leader of this group, the most important thing for a high-performing team? I said: psychological safety. I said I could go on about this for an hour, and I have. There are talks.
Okay, so here's the deal. If we kind of look at these four zones of performance and map out accountability on one axis and psychological safety on the other — accountability is real important, people want this, especially leaders: you should have accountability for the work that you do. You can't just have accountability. Because if we have a state of high accountability where you are held accountable for everything you do, but your psychological safety is low, what is that a recipe for? Anxiety. Because I can't do anything. On the other side, you can have low accountability and low psychological safety, and I mean that's probably fine, but you aren't going to get a lot of stuff done. If there's no accountability and no psychological safety, you've got people just sort of resting and investing, I guess. And then high psychological safety with no accountability is Comfort — which as an individual, for a little while, feels pretty good. You're like, okay, that's cool, I can kind of do my thing. But I'm just sort of doing my thing.
So the high-performing, generative learning place is when there is high accountability and high psychological safety. When the psychological safety is low, we run into this lack — these are symptoms or things that happen. If we don't have a diversity of voice — how many times do we sit there and end up going for the least controversial statement? Okay, well, I'm not going to bring this up, so we're just going to sort of go to that lowest common denominator. But also, when we don't have that diversity of voice and diversity of experience — people with different backgrounds of all kinds of ways will bring things up that someone who looks like me wouldn't even think about. There are constant examples of this with a lot of smart home technology and stuff, where someone who has had my experience wouldn't even think about how these things can be leveraged in an abusive way. But how are you going to bring that up if you don't have psychological safety? Those teams are not equipped to mitigate failure as well, because we don't want to rock the boat. We see a risk, we're not going to speak about it. It amplifies the normalization of deviance — and the inability to speak up is what brought us the Challenger disaster. Hopefully we don't have those levels of risk in our software projects, but still — we run into knowledge silos, we keep information to ourselves, and it's actually a form of protection. And then ultimately there's indifference and disengagement.
When we think about doing any kind of change, I like to talk about people being committed versus being compliant. You're not going to get everybody to always agree with everything, and that's not important. But if we're trying to drive innovation forward and change, we need to be committed. And commitment comes from being heard. One of the things — you could talk forever about Amazon's leadership principles — but one is "disagree and commit." And you can only do disagree and commit when you have high psychological safety, so that "disagree" is not just "your objection is noted, go to hell, we're doing this." That's not what disagree and commit means. So we have to have trust within our teams.
Within that boundary of the squad, how do we trust each other? How do we trust our direct colleagues and our direct leaders? And also: trust of the teams. So we expand this to how do teams trust each other. We're going to delve into one and then the other.
Trust within a team — when we're talking about my software delivery team, my SRE team, my folks, we're on the weekly team meeting together. How do we trust ourselves? This is really key. Dave Shackleford said, "It's hard to make meaningful progress until people feel heard." And heard does not mean agreed with, as he says. You get this physical relaxation response that happens when you believe that your point of view has been understood and acknowledged — not agreed with. Agreed with is not actually the key; the key is: you heard me, you acknowledged it, and you understood it. That's "disagree and commit" as well, by the way. You have to say, "Okay, I understand your point" — and not just glibly "I understand your point," but actually: "I heard that and I understand it this way. Okay. Okay." And then we kind of can continue our conversation. And when you know you've been heard, you kind of relax and you're like, okay, we're having a conversation.
A couple other tips here: approach conflict as a collaborator, not an adversary. This is really hard to do. I struggle with this in my daily life constantly. When we have conflict — I had last night, one of my kids' friend's dad called me, and I was all ready — this was going to be a fight. And I was like, wait a minute. No. I really like this guy. We both care about our kids, we're on the same team here. We are collaborators. It's not that I have to win and be right, because the reality is we were both right and it was our kids who were wrong.
Also, that's my daughter going to her first concert ever a year ago. And if you would like to see her give a conference talk, she'll be speaking at DevOps Days Chicago in August. I will talk more about that at the end.
We speak human to human — we're all people together. The phrase "just like me" is powerful. We are not all exactly the same, but we are more the same than we are not. We are all still humans with a human experience. And when we're working together — it's one of those little things. Like, they say if you smile enough you'll get in a good mood. If you find yourself saying this phrase when you're working with someone — you don't have to say it out loud, but just say, like, "Oh, Amy is a software engineer, just like me." Okay, we're reinforcing our commonality.
I really like the idea of replacing blame with curiosity. We don't know everything, we don't have all the facts in a situation, so we're investigating together. We're not looking for how do we figure out whose fault it is — we're saying, I actually want to learn more. You know more things than I do about the situation. How do we get that curiosity going? And as engineers and technologists, and also just humans, curiosity is powerful and we can do a lot more with it.
And then: how do we model our vulnerability? This is something that is more incumbent upon leaders, because it's hard. People say, how can I help to enhance more psychological safety in my team as a leader? Well, you can't just say we're going to do it — you have to display it. I'll give one story as an example. I wasn't a manager at the time, but I was on our sales engineering team when I was working at Chef. One of my co-workers — very sharp guy, great friend of mine, he was standing at my wedding, love this guy — was having a hard time with something. He couldn't figure something out. He's like, "I don't understand, I'm trying to SSH into this machine and it's not working." And finally, it was like, oh, I capitalized the flag instead of lowercase. And I jokingly said, "Oh, Sean, you know what kind of dummy wouldn't know how to do blah blah blah," and it was in good jest. He was not offended or upset because, first of all, he knows he's three times as smart as I am. But my manager came to me later that day and said, "Look, Matt, I know that you and Shawn are good friends. I know that Sean probably thought it was funny, or at the minimum wasn't offended. But what you just told the whole team is: when you make a mistake, Matty will make fun of you. And that's not okay."
That's the thing to watch for — it's those little things. Because broadly speaking we'd be like, "No no no, I don't punish people for speaking up." But we don't think about it because we don't think that the things we say... I mean, that was probably actually a pretty blatant one. But it's little things like that to watch for. And when we create those emotional bonds and show vulnerability, that helps create safety.
So then when we think about the trust of the teams — I'm not going to go as deep into this; there's more here, this is larger organizational structure — but when we think about a couple of the different ways to structure this: Jim Whitehurst from Red Hat has a great book called The Open Organization, which I really love to think about how things can work together. And like all things, it's probably more aspirational than practical in many ways. My other cynicism about this is: all business books can be summarized in a blog post. But in a conventional organization that is top-down, the "what" is pushed down a lot more. What we have is this command and control: we think about central planning for the whole organization, pushing that down through our frozen middle all the way down to our individual contributors. Down at the sharp end — the "why" at that point is promotion and pay. Why am I doing this? Because that's the only reason.
Now, this is a damn good reason to do work, by the way. I'm not saying anybody here has to do it for free, because I don't think any of us in this room would be doing any of what we do if they didn't pay us. But that's like the only reason. And then what you have in the middle is building that hierarchy, our title and rank. The "how" is pushed down.
In an open organization — you have to have structure, this is not a flat organization with no structure — but what we're doing at the top is around setting direction rather than defining the "what," and it's around decision-making. And then kind of again, it's bottom-up, because at the sharp end we understand the "how" and the "why," so we have that purpose and passion. And really in the middle, you're an enabler.
One of the things that gets really hard for folks at the top, especially in a traditional organization, is a lot of this is "need to know" — not everybody needs to know all of this. And the reality is, most of the stuff that's truly need-to-know, it's a much shorter list than we really think it is. But it has to do with siloing for power. This is sort of what has happened in traditional organizations; this is where those silos came from that we have been trying to break in DevOps for 12 to 13 years.
We end up with these functions, and they come from silos. Silos exist for protection — that's why silos exist in the real world, on farms. Silos are not inherently bad. But we sit there and say, "Because I have this jurisdiction, I can control it. I can make sure that everything I'm responsible for is fine, because as soon as those silos get leaky, people can maybe mess up my world." And then I've got that accountability toward it. So it's really this downstream effect of Conway's Law.
In an open organization, a lot of it still is bottom-up. But it's really — if we think about the concepts we're trying to accomplish in DevOps, it's all about feedback loops. In order to do these feedback loops, we have to be cross-functional, because it's the only way that we understand it. One of the things I've explained in my "how to work with me" document — I run a developer program and I work in marketing, but everything is still a DevOps problem. I've been having lots of conversations about cross-functional teams within our marketing department.
So I'm going to take a quick detour here and talk about Chris Farley. When Chris Farley first came on SNL — which is my first real exposure to him — I didn't really like him. I thought he was too broad. There are a lot of things I didn't like, until I started to learn about his history in Chicago improv and I watched these blurry VHS tapes of him at Improv Olympic. The man was a genius. The reason was because he was such a good listener. You would watch him do a scene and he would bring stuff back 20 minutes later that had come up because he was actually focused on being a great partner.
This clip — this sort of gift — is from a sketch he used to do on SNL called "The Chris Farley Show," where he would interview celebrities on his talk show and be really nervous about meeting Paul McCartney or whomever. It would always be stuff like, "Hey, hey, do you remember — do you remember that band Wings? Yeah, yeah, that was awesome." From people I have met who knew Chris, I'm told that's what it was like to know him. He was like that in real life. He would bring something up that happened to you 10 years ago together, because he was constantly paying attention and being a great partner.
Versus Michael Scott from the American Office, who was a terrible improviser. If you've seen this scene — I don't have time to show the whole thing — but the point was, every time when Michael was doing improv, he made everything about himself and tried to center it. He'd pull out fake guns and all sorts of things.
When I was studying improv in my first class, we actually had someone in our class very similar to Michael. I remember in one scene we were doing, about halfway through he got locked in a refrigerator in the scene. And at the end of the scene, the instructor said, "Jason, do you know why they locked you in the refrigerator?" Because we just had to get him shut down — he was off doing his own thing.
This connection comes back to how we collaborate as teams. TJ Jagodowski was my instructor at Improv Olympic. If you know the Sonic ads with the two guys sitting in the car, TJ's one of them. I learned a lot from TJ. One comment he made once: "Improv is like driving while only looking in the rearview mirror. We only know what has already happened." Guess what else is like that? Life. The work we do as a team. We can have all these ideas of things we're going to do, but all we know is what we've already done. We can have predictions but we don't know what's going to happen, because the work we do as a group is not scripted. We're improvising it. We're making it up as we go and we're building it together.
There's an idea we talk about in improv that's true for everything: "bring a brick, not a cathedral." What that means in an improv scene is: I may have an idea of what I want to do, and I may start doing a scene with you, and then you take it somewhere else. I can't shut you down because now we're negating each other. So if I come in with a fully-formed cathedral of what I want to have happen and then something changes, I'm not building it with you. Those cathedrals I build might get a laugh, you might think what I said was funny, but it's not collaborative. What we're doing instead is: I'm building the bricks, we're building the cathedral together. This comes to how we work together as a team, within the tech that we build as well. We may say, "The way we really need to design this robust fault-tolerance system is we have to do it this way, and I've got the whole thing figured out." Okay, come with the bricks.
How do we take these ideas of improv and apply them to DevOps? Trust in your partner. One of the most important things in improv is being a great partner, and that involves trust. You're working with great people. Okay, I know you might have some people you work with who are not great — it's fine, not everybody's great. But let's pretend for a minute. Generally speaking, we're all here to do great work, and we build better together than by ourselves.
There are no mistakes. We could put a picture of Bob Ross up here. The best thing about a mistake is we now know something we didn't know before. This is not "move fast and break things and just screw up all the time." But hey: if something happened that seems like it was a mistake, we now know something we didn't know before — especially about our systems that are not very good at talking to us.
The team is greater than all of us. Your reach should exceed your grasp; this is better than the sum of all the parts. This happens in improv scenes — you'll sit and say, "How did we do this together?" You watch a great scene and you're like, they had to have figured that out. Having done long-form improv, you'll walk off and say, "Where the heck did that come from?" And this happens with us again — we're building those bricks. The team is greater than all of us together.
And this one I kind of love: "The fun lies on the other side of yes." You've probably heard "yes and" — you don't say no. Okay, sometimes in the real world we say no. But the fun happens when we agree to do something together and go make something. It doesn't mean you say yes all the time, especially if you're an open source maintainer. As Justin Searls likes to say, "No is temporary, yes is forever." But the fun happens on the other side of yes.
So when we think about it in a scene: the players are at the sharp end. The people that are doing the work are at the sharp end. And I want to kind of leave with this — this is Del Close, who is one of the founders of Improv Olympic. When he passed away, he willed his skull to the Steppenwolf Theatre for use in productions of Hamlet, which I think is pretty cool. He says: "If we treat each other as if we're all geniuses, poets, and artists, we have a better chance of becoming that." And I think that's really important in the work that we do. If we treat everyone we work with as genius architects, as brilliant programmers, as innovative SREs, we have a better chance of actually becoming that together and building it together.
There are four things you can do right now to move forward. I think it's important in a team — I believe in establishing, sometimes I call this a social contract or rules of engagement. I do this with any team that I build, where we sit down collaboratively and say, "This is how we work together." Some of it is as mundane as: do we have cameras on or off on Zoom? How do we communicate with each other off-hours? All of those things. The important thing: if you're the manager, you don't write this, you don't define it. The team comes up with it together.
Creating space for open communication — this is even harder in our remote world, and you have to have intentionality around this. I'm going a little bit quick because I'm running out of time — don't worry, I'm going to give you a link to where the slides are.
Measure consistently for long-term improvement. It's not that important what you're measuring as long as you're consistent and you don't change your metrics every three months, because you're not going to see where it's moving.
And then I really love the idea of — guard rails matter. I believe strongly in making the right way the easy way. The gravity of that. But like they talk about in The DevOps Handbook, they say "buoys over boundaries." What that means is those guard rails are not necessarily brittle, but you're moving past them with intentionality — just like a buoy. The buoy is not preventing me from swimming out into that part of the water, but I'm taking my own known responsibility for doing that.
So this has been really fun. My name is Matty. These are places you can find me. If you want to get links to the slides, that's what that QR code will give you.
I have one more thing before we get into questions. Among all the other things I do that don't make me any money, I'm also the creator, the founder, and I still run DevOps Days Chicago, which is a fun conference here in Chicago in August. But other than that, I'm excited and happy to take questions and thoughts — and problems with your life.