Kubernetes role-based access control (RBAC) on paper seems totally sensical. It’s obvious: of course an organization would want to enforce user and application access policies to a cluster.
The Kubernetes official documentation provides a lot of guidance on how RBAC API objects work, but there’s little on best practices of how to deploy it in a functional way for an organization. The developer tried and true Google-fu method on “Kubernetes best practices” turns up the same lack of information wrapped up in listicles of security mantras (separation of duties and all that jazz).
Managing RBAC in a way that’s suitable to the size of your company is confusing and overwhelming. Before speeding to implementing policy, it’s worth figuring out what problems RBAC is actually trying to solve.
You’ll typically find that implementing RBAC permissions, like all things security, is a game of seesaw between limiting access and operational ease.
There’s the principle of least privilege, which is something your security team has been bugging you about. But what does it actually mean? Both your applications running in and your developers accessing the cluster should only get access to the resources that they need. Your applications should only have access to read their own secrets and configmaps. Pretty simple. Your developers are a little tougher to figure out, since there are a multitude of developer roles and they’re prone to shift over time. For example, a mobile developer may only need read-only access to your Kubernetes clusters, whereas a lead platform developer will need admin access. On the other end, you’re trying to balance operational ease. As a cluster administrator, you want to be able to quickly grant a new user a single role that gives them all the access they need (instead of the insanity of granting individual privileges). This makes auditing easier, since you’ll know exactly who has access to what resources through the role that they have. You also want to be able to spell out roles that don’t become a huge operational headache over time.
I’ve spelled out three realistic (I know because I’ve done them) approaches to Kubernetes RBAC.
Approach #1 - Cluster Admin for Everyone
If you’ve been doing Kubernetes since ye old days of cluster-admin.
If you’re a small startup strapped for engineering resources, that’s also likely the way things have stayed. Your developers have access to the entire cluster. Business velocity often beats out security, and that’s okay. It’s just worth knowing what your risks are.
RBAC implemented: ✅
Principle of least privilege: 🙈 You’ve just built in your security engineers’ worst nightmare. Every developer has access to everything (yes, your organization’s state secrets!) Your applications have the ability to run API commands against the Kubernetes cluster. If the application is compromised, then it’s safe to assume under this implementation that everything is compromised.
Operational ease: 🤷🏻♀️ This implementation is easy upfront, since your developers either have access to everything in the Kubernetes cluster, or don’t have access at all. The thing to note is that you’ll incur organizational risks and headaches as you grow. Your developers may unintentionally delete your configmaps and secrets. Oops.
Verdict: It certainly gets the job done. It’s probably a great approach for small startups with a high level of trust, little time, and few resources to completely build out (or need) a stricter RBAC policy. It’s not so great of an approach for companies that push a 30+ engineering headcount.
Approach #2 - RBAC Babysteps
After your organization’s Nth (hopefully this number is less than double-digit, but I'm not judging) incident requiring manual intervention due to a stray keystroke that wipes out all cluster resources, configurations, and secrets, you’ve probably outgrown the CAFA solution. Between these incidents and aggressive prodding from an exasperated security team, you’ve been forced to carve out RBAC policies for both your applications and users. What do things even look like now that default clusterrole is no longer sufficient?
Well... a good start would be making sure that all of your applications (for the record, this is applicable to daemonsets, deployments, statefulsets, cronjobs, etc) get a service account with a set of permissions that aren't just the default.
Then, you’ll specify a Role and a corresponding RoleBinding that ensures that you only grant access to the ServiceAccounts to only the K8 API resources the app needs. In this case, the example app only needs to read its own configs in a specific namespace (data-engineering).
Enforcing RBAC on the user side is similar in concept. You can create RoleBindings for individual users, but this is not the recommended path as there’s a high risk of operator insanity.
The better approach for sane RBAC is to create roles that your users map to; how this mapping is done is dependent on your cluster’s authenticator (e.g. the aws-iam-authenticator for EKS uses mapRoles to map a role ARN to a set of groups).
Groups and the APIs they have access to are ultimately determined based on an organization’s needs, but a generic reader (for new engineers just getting the hang of things), writer (for your engineers), and admin (for you) role is a good start. (Hey, it’s better than admin for everyone.)
--- # An example reader ClusterRole.
This is a ClusterRole so it's cluster-wide applicable to all namespaces.
# Remember, we’re talking generic reader/writer/admin roles.
# If you want things to be applicable to a single namespace, you can just do something similar to this with regular Roles instead.
name: reader rules:
- apiGroups: ["*"]
# An example reader ClusterRoleBinding that gives read permissions to
# the engineering and operations groups. The roleRef is for specifying the actual ClusterRole binding.
- kind: Group
- kind: Group
# An example writer ClusterRole
- apiGroups: ["*"]
- deployments f
# An example writer ClusterRoleBinding that gives write permissions to
# the operations group
- kind: Group
RBAC implemented: ✅✅ (RBAC’d so hard, double checks)
Principle of Least Privilege: 🤷🏻♀️ For the most part, yes. There are discrete reader, writer, and admin roles. Your applications all get specific access. Time to pat ourselves on the back. Job well done…
Operational ease: Sure, it’s not quite as easy as giving everyone and everything God powers, but the setup laid out here isn’t too bad overall. Except, you notice over time that you’re being consulted more and more by different teams on RBAC policies. Nobody else outside of your organization’s Platform/DevSecOps/Infrastructure/Tools team can be arsed to figure out what RBAC for Kubernetes even is. You find yourself having to often update your policies to recognize new custom resource definitions for the cool Kubernetes integrations your data engineers keep spinning up. Depending on the type of authenticator you’re using, you’re also likely manually provisioning developers into the group(s) that they belong into to get the correct access. You’re beginning to feel like a glorified YAML dev.
Approach #3: Automation
This last approach... isn’t really an approach. It’s more a series of guidelines to get you on a path to RBAC success. You’ll naturally adopt a lot of these measures over time after feeling the pain from #1 and #2.
For your applications, you’ll likely want to adopt a similar approach to the generic reader/writer/admin approach from #2. Most applications are unlikely to make heavy use of the Kubernetes API, other than reading their own configs and secrets. For CI/CD-related applications, you can be a little more lax on API groups. Creating a knowledge base of general RBAC templates and guidelines for the rest of the company to use is a great first step. If they’re easy to use and find, your developers will end up just copy/pasting them (which is pretty much what you want).
Depending on your authentication method in Kubernetes, user provisioning may be one of the overall painful points of handling RBAC. strongDM has a Kubernetes integration to standardize and make easy the ability to grant role-based access to a user into a cluster. Rather than creating direct user mappings, strongDM’s solution relies on generating roles all within Kubernetes and populating strongDM with a client certificate and key. Then, users can be provisioned access to the cluster in the same standardized way as all data sources.
Over time, as your organization grows, the generic reader/writer/admin approach doesn’t scale. (A solution that a random Internet stranger suggested isn’t the salve to all your problems? Surprise.) You’ll need more granularity for each of your roles, meaning you need to create more roles, which becomes harder to mentally juggle. As usual, open source solutions come to the rescue to make this easier to manage. RBAC Manager make it easier to manage users, services, and role bindings over time and namespaces via labels. rakkess and rbac-lookup both provide easy visibility of service account and user roles, which, for reasons unknown, is hard to determine using kubectl alone. (It’s almost like Kubernetes intentionally makes it hard for you to understand RBAC). Popeye, a general Kubernetes scanner for enforcing best practices, is useful for detecting unused RBAC rules that build up over time from updating and deleting roles.
RBAC implemented: ✅✅ You already got these two checks, so hopefully you didn’t regress and lose RBAC implementation.
Principle of Least Privilege: Yes!
Operational ease: Still 🤷🏻♀️. At this point, you’ve likely realized that implementing RBAC isn’t an exact science and is prone to shift over time, depending on the growth trajectory of your org. Hopefully, with a cocktail of off-the-shelf and open-source solutions, you’ll be able to cobble together a solution that works for you and doesn’t paint you into a corner. The engineer’s dream.