Identity Federation on AWS and Azure Instances

Why?

That’s a good starting question to start with, what’s the goal? Here we’re talking about managing access to instances on AWS and Azure in a unified way and there’s a bunch of possibilities, including (not exhaustive):

  • Local users from a csv list with a script
  • Local users using a configuration management tool
  • Using a central directory (NIS, AD, LDAP)
  • Using strongDM

While the two first options are legit for local services accounts, they really don’t scale well for human accounts due to number, turnover, company life, etc.

The directory option is the sensible one to use when you have more than a handful of users. Nowadays tool of choice is Active Directory for this case, mainly because its administration tools are pretty easy to use works well.

I’ll cover how strongDM compare to the Active Directory setup detailed below at the end of this post.

Setting up Cross Cloud Active Directory

I’ve taken for granted any organization will already have Active Directory to manage users and kept the architecture simple with one Domain Controller on each cloud provider, using a public DNS sub-zone delegated to those servers to simplify my setup.

On each cloud provider, I used a windows instance and promoted it to a domain controller, to limit the cost of this sandbox they are also the VPN endpoints between the two networks. I used IKEv2 with a shared passphrase (as they were not yet part of a domain to use Microsoft EAP) which is secure enough for this need. In a production environment, you’ll probably have something to handle those VPN connections and you’ll probably take advantage of AWS and Azure VPN services for a more robust setup.

Extending a domain on a cloud provider is debatable, some may argue that’s a taking a large risk in case the DC is compromised and that the network on the cloud provider should have its own subdomain. In practice, it’s easier and less error-prone to have the user and computers in the same domain.

Using Azure native Active Directory could have been an option too, but this then ties you to Azure version and forest functional level, so I avoided the case in favor of the IAAS approach (this may be worthy of another post to link AWS and Azure Active Directory services for another day).

Promoting two machines to a domain controller is pretty well covered by Microsoft documentation and I doubt someone would like to replicate this exact setup so I’ll skip to the interesting part, using this Active Directory to connect to Linux instances.

Joining instances to Active Directory

Prepare the Active Directory delegation rights

The first thing you MUST do is create an account which can register machines in Active Directory, you must NOT use an account part of Domain Admins group. If machines are compromised this put your directory and with it your company, at high risk.

Those rights are quite easy to delegate and you maybe already have this kind of delegation, to limit further the impact of this account being stolen I recommend creating an Organizational Unit (OU) to scope this account (or even one per cloud).

I did organize things into a pattern I find useful, and created groups for delegation as shown in the captures below showing the parameters for one group on one OU. You should avoid going deeper than 4 levels of nesting when this kind of need arises it’s the sign the OU architecture has to be reviewed. So here I use a top-level OU for the enterprise, a second level by a common type of objects and the third level by location to scope future searches.

In the spirit of scoping as much as possible, I’ll use a different account for each cloud provider.

Here is where the fun begins: configuring linux

I’m going to use pbis-open as I’ve had hard times with sssd and adcli, add to that I’m reluctant to install a python interpreter on every machine for the sake of joining a machine to a domain, and that pbis-open allow to register the machine in the proper OU directly and you have my main reasons to use it.

By taste, I’m going to use Ubuntu, it’s the distribution I find the most balanced between Debian like stability and version lag for middleware packages.

wget -O - http://repo.pbis.beyondtrust.com/apt/RPM-GPG-KEY-pbis | sudo apt-key add -
sudo wget -O /etc/apt/sources.list.d/pbiso.list http://repo.pbis.beyondtrust.com/apt/pbiso.list
sudo apt update
sudo apt install pbis-open<
/td>

And using a user added to the group created above:

sudo domainjoin-cli join --advanced --ou Enterprise/Servers/AWS aws.acme.net DJUser_AWS
# Joining to AD Domain: aws.acme.net
# With Computer DNS Name: ip-172-31-43-43.aws.acme.net
#
# DJUser_AWS@AWS.ACME.NET's password:
# SUCCESS

Woots, yeah that’s finally that easy, 4 commands and your machine is joined to your Active Directory domain, mind the proper configuration for DNS and firewalling rule to allow TCP and UDP toward on your domain controller, AWS has good documentation for that here. The most common trap is to forget allowing the user ports (1024-65535) in UDP.

Ensuring Everything Works

OK, it works, the first obvious drawback is the need to use a fully qualified name in the form of <user>@<domain.fqdn> or <SMBDOMAIN>\\<user>, which brings me to the next chapter.

Going Further and Alternatives

You may encounter a few glitches when installing pbis-open, one of them is that the domainjoin-cli command doesn’t always succeed at configuring pam and nsswitch, hopefully the command has some internals features to fix that:

$ domainjoin-cli --help-internal
[...]
Internal debug commands:
fixfqdn
configure { --enable | --disable } [--testprefix <dir>] pam
configure { --enable | --disable } [--testprefix <dir>] nsswitch
configure { --enable | --disable } [--testprefix <dir>] ssh

Now to fix the annoying part, using the domain as the default domain, configuring the shell and a few more:

/opt/pbis/bin/config --dump

This will give you the actual values for a lot of parameters (for some more arcane needs you’ll need to tweak the registry, but for 99% of the cases those parameters will do).

Create a file (I like to keep using the /etc/pbis directory created by the package and name it pbis.conf, but any will do, pbis will store it in its registry hives) and add those parameters:

AssumeDefaultDomain true
HomeDirForceLowercase true
HomeDirPrefix "/home"
HomeDirTemplate "%H/%D/%U"
HomeDirUmask "022"
LoginShellTemplate "/bin/bash"
SkeletonDirs "/etc/skel"
Local_AcceptNTLMv1 false<
/td>

The first parameter will allow to just use the username to login.

The second part is the local home directory configuration, I especially find the HomeDirForceLowercase parameter very useful.

And last, disabling NTLMV1 is always a good idea unless you know why you absolutely need it.

The last action is to load this configuration, with:

/opt/pbis/bin/config --file /etc/pbis/pbis.conf

Logs are kept as all pam logs in /var/log/auth.log, but detailed errors from pbis services go to /var/log/syslog and give more details in case of login errors.

If you want to dig deeper in pbis-open, the whole documentation is available on github (as pdf), a nice point is you connect from another machine in the same domain is that you can benefit of the gssapi to single sign-on on ssh.

Conclusion

Joining a Linux machine to Active Directory with this method is really useful, it allows handling all other administrative tasks, sudoers permissions, Kerberos authentication on apache with well-known pam methods, using users and groups.

The main drawback is the need to get a domain controller in each zone and maintaining them, even a small latency can make the password validation and ticket issuance pretty long. If you get to store the users public keys in AD and use and helper script to validate the logins by key on ssh it may end up taking several seconds to login.

Automating all this in a cloud environment with several machines brings the problem of removing the machines from AD when they’re terminated, either by a periodic task or at the machine shutdown.

The strongDM Alternative

strongDM brings a slew of benefits in term of maintenance:

  • Only a few users on each server for each role (mostly sudoer or not) are needed, that’s easy to achieve and can be baked into your base image.
  • The gateway is easier to maintain up to date and easily scalable for load or failover.
  • No need for a cross-cloud VPN or link between your on-premise site and cloud providers.
  • No more worries about defining the subnets so AD sites work properly.
  • No risk of KCC inconsistencies in your Active Directory Forest.
  • Easier to rotate secrets to register a node.
  • No need for a domain join delegation.

Of course, using a commercial software means a cost, it has to be compared to the total cost of managing a full integration of your instances and the usual point forgotten is the human cost. Keeping an Active Directory architecture on multiple sites healthy and its content clean can be very time-consuming and needs at least one FTE managing the tasks. Comparing only the software cost is the usual error when comparing a Free Open Source Solution with a commercial product, that’s just a normal human bias as human time evaluation has a tendency to feel like wizardry.

If you'd like to see for yourself how strongDM can help manage cross-cloud IDs, schedule time with one of our founders today.

14 day free trial - try strongDM