Why Fair Eliminated Static Credentials -- A Retrospective

Fair Eliminates Static Credentials with strongDM

Cat Cai is currently the Director of Platform Engineering at Fair. In this talk, alongside Jack Wink and Marshall Brekka, they discuss how Fair eliminated static credentials through automation and tooling decisions. Listen as they walk through how they make sure they enforce least privileged access, and rotate credentials without causing a huge headache in the organization.

Talk Transcript

Jack Wink 0:04
I'm Jack

Marshall Brekka 0:05
I'm Marshall

Cat Cai 0:06
And I'm Cat! And together. the three of us comprise of some of the members of the platform team at fair. Yeah.

Jack 0:15
So, you know, what is Fair? Like Mason kind of gave you the real answer. But this is our answer. We're a mean, we started as a mean that we've worked backwards from there, you know, how can we get to a place where like, you download a car? Well, you know, 3d printing is not there yet. So instead man an app all that cool jazz, you don't care about that.

Cat Cai 0:36
You're just here for like the engineering, right? And like, what our team does. So one of life's like, great mysteries is what is a platform team?

Because if you go to every company like DevOps platform, SRE, they're just kind of like the catch all team. And it's like, what is it that you guys do? Well, we encompass dev sec ops. So that means security, we handle CI/CD. And because we do those things, the company was like, "Hey, you guys want to do compliance reviews?"

And I was like, sure, here you go, Jack. So poor jack over here also does compliance reviews. We also manage the infrastructure of the company. So we're kind of like the company's AWS and Kubernetes in house experts, which means I am just one Google step ahead of the dev who's asking me the questions. It's awesome.

We're also responsible for some core services in the company. So we wrote our internal pub sub system. We built our own API gateway in our infinite wisdom. And we also built a tokenization service. And most importantly, we are responsible for giving out public scoldings. So when developers, put PII and logs or committed secret into GitHub, we shamed them publicly on Slack, because we got to keep them in line.

Marshall Brekka 1:53
So Cat told you what we do. But a little bit about how we plan for this, we usually pick an extremely position, we search Hacker News for content and comments that validate this extreme position. And then we force the engineers to do what we want them to do.

So this talk is going to be all about how we think static credentials are bad. We found a whole bunch of Hacker News content to validate this opinion. And so I'll hand it off to Marshall. Let him talk about the war on passwords.

So the war on passwords. This is what we're trying to eliminate, right, our static IDs creds, user passwords without two factor and the DB passwords. But we got to have a little bit of history. So early days of Fair. You know, I don't know if any of you have done a startup before. But it can be pretty crazy trying to figure out your product market fit. So it's really important to start with a small, nimble team of developers, you know, maybe four to six or 20, whatever.

And another really great advantage about being a startup is you have no legacy technology. So you can use all the best practices. That obviously means we had to do microservices. So we started with five microservices, because it sounded like a really good number. And we deployed that to Elastic Beanstalk. And then finally, you know, last but not least, security was really important. And we wanted to give our devs the best and most secure ways to access their databases.

So you know, shared credential, shared root, to a bastien box. And things are going pretty great. You know, we're living our hashtag startup life. And everyone's smiling, running on Elastic Beanstalk. People are happy, you know, and life is good, life is good.

And then the dreaded it had to come along and ruin everything with Okta, Okta, an SSO, and that ruined our AWS users. So we had to say goodbye to AWS users. I'd like to say that we were really forward thinking, the devs, we really weren't this kind of got forced on us. But honestly, this is a good thing. We were all kind of coming from a place of being used to having your own static AWS creds for your local dev box. But you know, that's bad for all the reasons I think we all know about where, you know, those guys think it committed to get GitHub. And then you know, you read those horror stories, people wake up on Hacker News, like, Oh, no, I had the $80,000 bill, because I've been mining Bitcoin.

So this is actually really great for us. But it was a little painful to get that going. So we built some internal tools to help provision these temporary credentials from Okta rolls on to people's local dev devices. So where are we at? We got rid of our access keys, still these user passwords, and we still the database password, so awkward men face I guess.

Cat Cai 4:51
Okay, so, uh, yeah, moving on from like, the early stages of like, Fair's life as an early startup, you know, 2017 to 2018, we've probably fundraise like 50 million, you know, we're doing pretty good. We've doubled our dev count from about 20ish to about 40. And we're really leaning into the microservices, you know, going from five to about 20 ish as well. And the platform team went from one person, Marshall, to me, and then we also brought on Jack. And if you look at the numbers, we did not scale at the same rate as the rest of the company.

Because you know, we're hiring like the best Rockstar ninja. So like, it's hard to hire for platform engineers. But that also means that you know, all of our problems start growing. You don't know everyone by face anymore, so you can't really publicly shame them. Or like, if you don't know that guy's name, you're like, hey, like, you can't like scold them for being like, Hey, your password is like, you know, you've been pwnd. And like, so we found like, okay, all of our problems with like, passwords have gotten really bad. We need to bring in two factor to have our keys services like GitHub, and Okta are still being used by devs and using like, they're still using passwords. And so we had to find something to kind of like secure that, right?

So devs typically they'll pick really bad passwords, they'll sometimes share passwords, right on slack. And we're like, Cool. All right. poop emoji, not cool. We started growing, outgrowing Elastic Beanstalk. And because it was all the dopest rage on Hacker News, we decided, yeah, Kubernetes that sounds really good. And because of our Okta ntegration, everything else with the internal COI tool that Marshall talked about, we're able to very easily add off into Kubernetes these, so no passwords, or no certs being passed around for Kubernetes awesome, doesn't solve the two factor problem. So we were like, great, what does Hacker News say? Well, let's do u2f for you know, Max buzz words. Right. So basically, we wanted to have you know Okta with our internal CLI tool, and u2f. We didn't actually support it at the time, and we were like, great, we need to find a way to just build integration into that.

And finally, we also started using one password. So why one password when we already have Okta? Well, for a lot of our vendors, you know, stripe, we would have shared dev accounts. And so dubs were like, cool. Here's the password in a public Slack channel. And you're like, No, don't do that. And one password was like a really nice answer to that to say, like, Hey, you shared vault, share the password there. So basically, where we landed was just two factor all the things. What I wanted to just show you appear was kind of what our internal tool looks like. So a dev will just come in and type in 'fair login all' and we just present you know what the two factor methods are.

Really, all we did was just provide like a CLI UI for dogs to use. Okta is actually doing all heavy lifting, because we actually asked their API like, hey, what two factor methods are they actually enrolled in? It was very, very easy for us to build into factor support. So where does that leave us? Well, very cool, in the sense that we don't have access keys, we have auth into Kubernetes. And we've kind of secured user passwords with two factor. And we're now you know, using one password so that Deb's can share passwords more securely, except we still have this problem with database passwords. So we're still kind of in a poop emoji kind of status, because 40 devs are now sharing this private key and tunneling into a box to access the databases.

Jack Wink 8:43
So 2019, it feels like we have a million engineers. It's insane how much we've grown. We're currently around 80. So I've got this classic more money, more problems slide.

You can see as money's gone up, so the number of Deb's, and our morale has kind of plummeted. The reason for that us what, what's the cause? Our daily work we come in, we log into GitHub, we look at some code reviews, and somebody checked in the database password to GitHub. Honestly, I'm not even mad about this. The passwords Fair 2016. Nobody's rotated that in three years.

So yeah, this is this is an issue. What we really want to do is we want to eliminate that SSH tunnel, we want to make sure that we get the developers off that that shared PAM and stop using the RDS super user to log in. If we can, we want to enforce least privileged access. And we want to make sure that we can rotate these credentials and not cause a huge headache in the organization. And so the way that we do that strongDM, love the product show. But we have the agent deployed in cluster. So it's in the VPC that can connect over to the database is the dev site in through Okta. So we can get rid of the whole shared RDS super user thing. And we have a script to provision the databases automatically. And we create a management users. So we kind of backdoor have backdoor access ourselves into the database.

And then we create READ WRITE and admin roles for the the teams that manage these instances themselves. And then we automatically kind of log those to strongDM. And then we let our team managers assign access to the data sources that their jobs need. It could be better, but it's kind of where we've landed. What this let us do is eliminate the SSH tunnel, we've gotten rid of the shared credentials, no one's logging in as a super user anymore.

We're able to enforce the least privileged access, not everybody needs admin rights on prod, you know, we can give out the read roles for people we're doing analysis, etc. Then the tricky part really comes with the supporting rotation of credentials, we can do that pretty easily through strongDM for the developer stuff, but we're still kind of stuck with the application auth itself. And there's a couple ideas we have moving to the IM for RDS. Maybe we can use vault to issue temporary credentials. Maybe we do a sidecar proxy ourselves. We haven't spent a ton of time on this, but it's kind of our plans for 2019.

Another thing, automation, huge part of it right now, we still have to run the scripts ourselves. But there are events that fire when RDS databases come online, so we could enroll these databases automatically. Another big issue we found with this is just Postgres roles, the permission systems pretty great. And then it gets weird when it comes to ownership and doing migrations have a couple thoughts around this, we need to make sure that the tables are owned by the same user so that when you run a micro as a migration user, you're able to edit the tables, you don't have sentence created by an admin that the migration user can't touch. Worrying about doing that with triggers to change the ownership. Then if teams create their own users and Postgres, we have a similar issue of making sure that the read user can read from that new users tables. If they're able to create it. We could probably also do triggers and alter the default privileges there.

Marshall Brekka 12:26

The other thing we've considered is hiring a DPA because we are not DBS. So if you're interested, I'd love to hear from you. And like maybe talk to the complaint HN and see if anybody has some recommendations, or maybe some combination of all of these things we'll see. And then like, the last thing that we want to do in 2019, is to kind of research the permission model for other data stores that we use, and see if strongDM can come in handy there.