Why is AWS IAM so @!#^$!# hard?

One of my favorite Directors of Cloud Platform

AWS Identity and Access Management (IAM) is a security tool that controls access what AWS API actions that Principals (roles, users) are allowed to perform on which AWS resources: an S3 bucket, EC2 instance, DynamoDB table, etc.

IAM is really important and it’s really hard for a lot of people, even on good teams with robust processes.

IAM Principal Action Resource
The AWS IAM Principal-Action-Resource Dance

The principal’s permissions are controlled by a policy attached to the identity, primarily.

If the following IAM policy were attached to a role, e.g. ecomm-fulfillment, the ecommerce fulfillment application could read orders from the bigco-widget-orders S3 Bucket with a policy that looks like:

   "Version": "2012-10-17",
   "Statement": [{
         "Effect": "Allow",
         "Action": "s3:GetObject",
         "Resource": "arn:aws:s3:::bigco-widget-orders/*"

This starts simple, but the complications pile up quickly.

The Soon-to-be-Obvious Complications

Say you are trying to apply the Principle of Least Privilege, a common Security best practice.

You provision a role for each application component that has an IAM policy that allows it to perform only the AWS api actions it needs. All the application components are slightly different and there are 150 AWS services with more than a total of 3000 api actions. So there’s a heck of a lot to analyze and understand when you go down that path.

The good news: AWS starts by denying principals the ability to do anything by default, so you can allow api actions as you need them.

Some bad news: Once you enable certain api actions, e.g. s3:getObject, that principal may be able to perform that action against any resource for that service in the account. That is, the ecomm-fulfillment role may be able to read from the bigco-customer-credit-details bucket unless:

  1. their S3 access included a resource condition like the example above
  2. the s3 bucket has a resource policy that only permits certain principals to access the bucket

If this is starting to sound complicated…here’s a diagram that may ‘clear’ it up. This is the IAM policy evaluation logic, from the docs:

There’s actually 5 kinds of policies involved in the evaluation:

  • Identity-based policies
  • Resource-based policies
  • IAM permissions boundaries
  • AWS Organization service control policies
  • Session policies

We won’t dig into this right now, just know that even a single IAM user role can have a very complicated security story.

The Non-Obvious Complexity

I think the non-obvious complexity is coming from a few industry wide phenomena:

  • Growth of Information Technology Overall
  • Decomposition of Application Architectures
  • Continuous Delivery, especially with Infrastructure as Code

The number of identities used to manage and operate technology services are growing, or exploding.

First, successful organizations are growing. Existing applications may be migrating to the Cloud for cost-efficient scaling. Organizations are also creating and integrating many new applications to meet emerging needs and evolve within the market.

Application Architectures are Decomposing

Second, organizations are decomposing application architectures. Monoliths are being decomposed into Services, and Services into Microservices, and then on to Functions. This transformation can easily yield one hundred application identities or more, especially when trying to apply Principle of Least Privilege. And that’s in a single business unit or department in an organization that may have several, each of which have their own AWS accounts. Naturally, many of the new and migrating applications are trying to leverage new managed service offerings from Cloud providers to avoid undifferentiated, heavy lifting.

Third, organizations have discovered that the ability to deliver applications quickly and safely to customers is a competitive advantage. Continuous Delivery and complementary practices are like infrastructure as code are here to stay because of the value they provide to the business and its customers. The only thing that’s really likely to happen is wider adoption and faster change delivery rates.

These three things combine to create large stress on security, platform, and ‘devops’ people to:

Support ever greater rates of change…

For ever increasing numbers of applications…

With increasingly critical data…

If you’ve been having trouble describing why this hurts, I hope this helps you.

If I’ve missed some aspect of the pain you’re feeling (I’m sure I have), please hit reply and tell me! I want to understand how this affects you (really).