This post on service meshes and SPIFFE will wrap up the series on how application identity and access control is migrating from implicit identification by IP address to application-level authorization.
Let’s start with an identity management technology that you may or may not have heard about: Secure Production Identity Framework for Everyone (SPIFFE).
The SPIFFE specification offers an API that lets application workloads ask the question “Who Am I?”
Here’s Andrew Jessup of Scytale explaining how it works in about 3 minutes at Kubecon Europe 2018:
SPIFFE answers the ‘who am I?’ question with a name like
ecommerce-frontend, as well as cryptographic key material the application workload can use to prove that to others via attestation. SPIFFE is designed to answer this for a workload modeled as a virtual machine, container, or process.
This is very much the same identity model as the one described in Cloud native application identity and access control. So if you understand how applications interact with an EC2 metadata endpoint to retrieve credentials that identify the application as EC2 instance’s IAM role, you can draw all sorts of equivalences (or GCE Instance Metadata API). If not, I suggest examining the diagrams in the Cloud native app identity post to dig into the sequence of identity bootstrapping operations.
One of the primary differentiators between SPIFFE-based identity and a Cloud platform’s native identity is that SPIFFE is designed to work across technology platforms.
The idea is that the ‘Subscriptions’ service in the diagram above should be able to authenticate with the ‘Billing’ service, and both of them back to the ‘Members DB’, even if they are running on AWS, VMWare, and a bare-metal Linux machine respectively.
One application identity system to rule them all.
Of course, the SPIFEE identity issuing components have a very serious job to do with respect to bootstrapping the chain of trust from the underlying compute infrastructure and provisioning identities to applications. The SPIRE project does that with:
a production-ready implementation of the SPIFFE APIs that performs node and workload attestation in order to securely issue SVIDs to workloads, and verify the SVIDs of other workloads, based on a predefined set of conditions.
Sounds very cool, though I haven’t deployed and operated these projects myself.
So what does this have to do with service meshes?
Service Mesh control plane components like Istio Citadel and Hashicorp Consul can issue applications SPIFFE-compatible identities. This means when a service instance joins the mesh, it will have a proper application identity, vended by the platform’s trusted compute, identity and networking control plane.
This is exactly what we want.
The SPIFFE homepage goes so far as to say:
There are two claims here:
First, every workload gets a cryptographically secure identity. Agree.
Second, SPIFFE removes the need for application-level auth and complex network ACL configuration. “Removes the need” seems like poor phrasing, at best.
SPIFEE enables SPIRE and service mesh control planes to provide workloads with identities in x.509 and JWT formats that other workloads can use to authenticate peers. Great! If you’ve been authenticating applications using their network identity, this is a stronger foundation to build upon than IP address or CIDR block, especially when it comes to containerized workloads.
But there’s a really hard problem this bit of marketing glossed right over:
Modeling who has access to what services and data.
The collaborating application (Subscriptions or Member DB) should use the peer’s identity to understand who is on the other side of the request and perform the most important part: authorize the caller’s requested action.
It seems like the responsibility for modeling and solving the access control and authorization problem is largely being pushed into the service mesh control plane and whatever tooling you have to manage those abstractions.
That is, we have network ACLs by another name — implemented by the service mesh. Or maybe it’s the responsibility of a ‘service mesh orchestrator,’ which are a thing now.
I understand the organizational dynamics here, but there’s a sign blinking red that many organizations might overlook a fantastic opportunity to materially improve security when adopting service meshes.
The SPIFFE+service mesh+ approach may be better than the largely static network firewall rules with which so many people have struggled to secure their applications in classic datacenter deployments. When evaluating service meshes and orchestrators, evaluate how well the tool:
- lets you model authorization between services
- helps you scale changing that model with the teams in your organization
Of course, the next big trick is to get applications to use this strong identity to authorize peers’ actions. I want it all 🙂
SPIFFE and the service mesh ecosystem are the technology I wish the virtualization, networking, and security vendors had built 5-10 years ago. This is not a far-fetched wish – the Google LOAS system that inspired SPIFFE launched in 2005.
I hope these efforts succeed in helping customers improve the security of their workloads and driving innovation. I also hope that people develop and maintain a technology strategy that accounts for the 3-10 year commitment these fundamental technology and organizational choices represent. I have added a Strategy section to the Knowledge Base to help you with that.
As always, feel free to reply with questions and comments. I love hearing from you.
Receive #NoDrama articles in your inbox whenever they are published. Reply to Stephen and the QualiMente team when you want to dig deeper into a topic.