<img src="https://ws.zoominfo.com/pixel/6169bf9791429100154fc0a2" width="1" height="1" style="display: none;">

At the Intersection of SRE & DevOps

StrongDM manages and audits access to infrastructure.
  • Role-based, attribute-based, & just-in-time access to infrastructure
  • Connect any person or service to any infrastructure, anywhere
  • Logging like you've never seen

Site Reliability Engineering (SRE) is a relatively new addition to the DevOps world, and delineating between SRE and DevOps can be confusing. That’s why Justin McCarthy, CTO and co-founder of StrongDM, recently sat down with Alan Shimel at DevOps.com and a panel of technology experts to discuss site reliability engineering (SRE), its relationship to DevOps, and why it’s so hard to pin down a definition of the role.

The full panel included:

  • Tim Yocum—Director, Operations at InfluxData
  • Greg Leffler—Observability Practitioner Director at Splunk

So, what exactly is SRE, and why does it matter? Here’s the recap:

SREs: Fanatics about Uptime

“SRE is […] really just a practice within the current DevOps evolution.”

— Tim Yokum, InfluxData

SRE is more than a buzzword, it’s an integral part of production. While most DevOps engineers want to focus on writing new features, closing tickets, and moving on, site reliability engineers are plotting out the future of the product in a way that’s sustainable instead of just doing a break/fix.

At smaller organizations like StrongDM and InfluxData, SRE is often distributed throughout DevOps. Even at larger companies like Splunk, SRE is more of a mentality than a title. But however you define the role, everyone agrees: you need SRE-type people embedded within your teams, and close enough to the product to understand it.

Where Does Observability Fit In?

“If you are [...] architecting reliability into your services, or architecting resilience into your world, you have to have observability as part of that. There’s no way to do it without it.”

— Greg Leffler, Splunk

While old-school monitoring required you to know what you wanted to monitor, observability has you instrument everything, and then use that data to figure out what's going on. Whether you’re looking at access observability, application data, or Twitter sentiment, SREs help you prioritize what matters most, like keeping your customers happy or meeting your service-level objectives (SLOs).

Measuring SRE

“If you're in this pretend world where we say everything is five nines all the time, well, either you're incredibly rich, or incredibly brilliant and incredibly lucky.”

— Greg Leffler, Splunk

Using SLOs as a metric for SRE lets you drop the need for perfection. Service-level objectives focus on setting expectations within the team. 

Site reliability engineers can use SLOs as internal currency to keep everyone focused on what really matters, like building more reliable applications, without getting bogged down in business-side metrics like SLAs.

Final Thoughts On Hiring

“Bust through that 'ten years of SRE experience' or '200 years of Kubernetes experience' job requirement.”

— Justin McCarthy, StrongDM

It’s really difficult to recruit for this position, especially as HR often writes job descriptions with excessive requirements. This is compounded when managers search for new hires with the SRE job title. Often, people in the SRE role don’t even consider themselves SREs. 

So what’s the best way to hire for the role? Drop the bullet points, and focus on the people. Anyone who enjoys troubleshooting and appreciates accumulating knowledge should be a part of the talent pipeline. 

While the role may be fuzzy, the need is clear. In a complex world of cloud-forward micro-services and ephemeral infrastructure, SRE thinking is a crucial component of your IT environment.

Did you miss the panel? No worries, you can still check out the replay. And if you have a growing team that needs access to Kubernetes clusters, AWS accounts, databases and all of your infrastructure, come on over to StrongDM for a free demo.

About the Author

, Contributing Writer and Illustrator, has a passion for helping people bring their ideas to life through web and book illustration, writing, and animation. In recent years, her work has focused on researching the context and differentiation of technical products and relaying that understanding through appealing and vibrant language and images. She holds a B.A. in Philosophy from the University of California, Berkeley. To contact Maile, visit her on LinkedIn.

💙 this post?
Then get all that strongDM goodness, right in your inbox.

You May Also Like

Cloud Infrastructure Security: Meaning, Best Practices & More
Cloud Infrastructure Security: Meaning, Best Practices & More
In this article, we will broadly examine ‌cloud infrastructure security and explain how a strong cloud security posture benefits organizations. You’ll learn what the top three most costly cloud infrastructure security mistakes are and how to avoid them. By the end of this article, you’ll have a clearer understanding of how cloud infrastructure security works, why it is important, and how to secure cloud infrastructure in order to protect critical IT assets, sensitive data, and intellectual property.
Enterprise Kubernetes
Kubernetes in the Enterprise Webinar Recap
Join strongDM CTO Justin McCarthy and a panel of experts as they discuss the challenges, complexities, and best practices of enterprise k8s adoption.
Kubernetes Governance
Kubernetes Governance Webinar Recap
Is k8s governance a challenge for your team? Join strongDM’s CTO and a panel of experts to discover common pitfalls, plus tools + tricks to help manage them.
Cloud-Native Data Protection Panel Recap
Olive AI Infrastructure Access Must Haves
How Olive AI Standardized Infrastructure Access in Order to Scale
During a featured session at this year’s DevOps Experience, Olive AI CloudOps Engineer Kellen Anker spoke with Justin McCarthy, strongDM CTO & Co-founder, about his company’s journey towards achieving one-click onboarding access and the resulting radical improvements in workforce efficiencies.