Hey – I’m back!

I’ve been heads-down learning what Cloud teams need to deliver to AWS securely and make IAM usable, and building that into a business delivered via SaaS & infrastructure code libraries: k9 Security.

I’d like to share my experience in how continuous delivery and serverless supported that endeavor. I hope this experience report can help you do the same.

k9 Security helps Cloud teams deliver to AWS securely with usable IAM automation and access audits.

So you and all your colleagues can actually secure AWS IAM, accelerate continuous delivery, and sleep well.

Early customers are identifying and fixing important issues, and we’ve learned a lot about where we need to go.

Let’s start with the key takeaways so you know where this is headed.

Key Takeaways

k9 learned what needed to be built from interviews, demos, and early adopters of the MVP. We used Continuous Delivery and AWS Serverless to:

  1. integrate customer feedback and scale to meet challenges of large, diverse customer environments quickly
  2. measure and improve operational costs so k9 is profitable as it grows

Result: k9 made significant progress as a business, and I am confident about our next steps to learn & grow.

Here’s why…


I’m working full-time on k9 Security’s marketing, sales, product development, operations, and customer success with help from a part time engineer.

The k9 Security SaaS backend analyzes customers’ AWS accounts and reports the information needed to analyze and audit IAM access in a way everyone understands with tools they already use (Excel, Splunk).

People identify access control risks clearly and quickly shift effort from gathering data to analyzing and improving security. The k9 Security infrastructure code libraries help people actually configure the access they intend (Terraform & CDK).

k9 launched its MVP late last year and has been incorporating customer & prospect feedback ever since.

We’re running the classic Lean Build-Measure-Learn loop with Measure and Learn gathered from direct customer interactions wherever we can.

Here’s what we delivered in the first quarter of 2021 and some of what we learned.

Continuous Delivery

In the last 90 days, k9 Security deployed the backend to production 81 times:

Iterating towards problem-solution fit with many small deployments

We averaged more than one production deployment per day during 2021q1 while iterating towards problem-solution fit. I knew we were productive but this was higher than I expected for less than one full-time engineer.

This does not include improvements to the complementary infrastructure code libraries which consumed significant engineering effort, nor “the rest” of the business.

The 268 changes backend service changes:

  • increased coverage of AWS services and API actions by 3x
  • scaled the analysis engine’s limits by 25x to support ‘very large’ AWS accounts with thousands of IAM principals (1)
  • delivered several customer-requested AWS IAM principal & credential audit features
  • added csv exports to support customer SIEM integration, particularly Splunk
  • refined the k9 access capability model
  • improved onboarding experience with automated collection of configuration

I never really worried about quality. The unit, integration, and functional tests have provided the safety net we need and we continue to build them out as we go.

k9 deploys on Friday, and every other work day a change is ready to go. We hold changes to the next work day if we feel like we need to monitor late-afternoon operations and don’t have capacity to that. No big deal. Customers will capture value soon enough.

Note the build and functional-tests job success rates are dragged down by AWS API throttling errors. Eighty-percent of those failures are transient. Yes, we should fix them and indeed a number of issues have been addressed. But:

  1. it’s less of a problem inside a tight development loop delivering a change per day when the caches are hot
  2. “wait 20 minutes, then retry” is surprisingly effective when working alone (deadly once it blocks another stream of work or engineer frequently)

We felt very productive with significant scaling enhancements and medium sized features taking a week or less (the changes above). We are able to promise incremental AWS service coverage additions in 2 business days or less.

By now you’re probably convinced k9 was able to deliver changes that delivered new features and scaled-up existing features.

What did it do for the business?

Objectives, Key Results, and Learnings

These objectives have been at the top of the k9 weekly Sprint doc for in some form for the last 9 months:

k9 Security Objectives: MVP & Phase 1

k9’s main objectives since launch have been to gather and incorporate feedback from customers, prospects, and Cloud Security problem interviews. (Aside: Can I interview you? I promise that 30min will be useful to you.)

Those 81 deploys delivered many capabilities that were:

  • requested as direct requests for a solution
  • observed in Cloud Engineers’ homegrown solutions and OSS tools
  • problems without a solution or gaps can’t/won’t close

k9 is up to 3 paid customers now, and we’ve learned a ton about what the product needs. We’ve integrated that feedback into our roadmap, shared what we plan to do (and not do) with customers, and already delivered a lot of it.

I’ve also learned a lot about the really hard part for myself and many technical founders: Marketing & Sales.

Marketing & Sales

Learning Marketing & Sales for k9 Security has been challenging for me and it’s a lot to learn at once. k9 has to address:

  • a highly technical & nuanced problem that is often misunderstood or unappreciated
  • a multi-part SaaS+infra code solution & emerging #DevSecOps practices
  • a (traditionally) complicated buying process crossing Cloud & Security org silos

We’re learning and applying basic marketing & sales practices, and we’re making progress. We connecting with more of the right people and reducing “friction” with better positioning, messaging, targeting, and purchase via AWS Marketplace.

There were many direct tests and much market analysis to answer “who” we should talk to:

Q1. Who understands AWS security well enough to understand what we’ve built and have the pain?

Teams who directly change and review AWS security policies on a daily or weekly basis.

Q2. Who cares enough about information security to integrate it into their delivery pipeline?

Organizations who manage a lot of infrastructure with code and care about information security as much as compliance. Secondarily, teams who know/suspect they have security policy problems and want to clean it up before rolling out a $BIG_CHANGE.

Those answers might seem obvious, but they’re really not (3).

It took nearly six months of 2020 to figure this out. We’re trying to improve on that by using proven “Customer Development” practices (Talking to Humans, Constable) to conduct effective lightweight experiments.

More than half of 2021q1’s effort went into Marketing & Sales activities.

We experimented with and changed marketing & sales artifacts and process twice as much as product. And it’s not done. We’re kicking off an experiment today. We’ll need to keep experimenting because the product, customers, and market will change over time.

I’m sharing the marketing & sales experience because continuous delivery and serverless enabled k9 Security to:

  • learn more about who our customers are and will be and what their problems are
  • determine what those customers value, as evidenced by an exchange of money or attention
  • integrate customer feedback and scale to meet challenges of large, diverse customer environments quickly
  • measure and improve operational costs so k9 is profitable as it grows (2)

Result: k9 Security made significant progress as a business, and I am confident in what we need to do next to learn about the market & grow within our customer segment.

The same may be true for your organization’s own risky projects. We’re not out of the woods, but I think I see the sun 🙂

More to share

Over the next few weeks I’ll share more about how a small team improved k9 customer onboarding, analysis engine, and operations to analyze hundreds of thousands of IAM access controls per week with an AWS serverless application. All while iterating on problem-solution fit with very little drama.

The topics I have in mind:

  • Sharing what’s on the Operations KPI dashboard
  • Decomposing monolithic Lambda functions after learning how the application needed to scale.
  • Using AWS Step Functions to orchestrate Lambda functions and nested Step Functions, particularly scaling out with Map states, retries, and failure handling.
  • Caching expensive per-analysis objects in S3 and sharing across analysis functions orchestrated by Step Functions
  • Building a cache backed by DynamoDB with a Python Mapping interface
  • Smoothing cache expiration and refresh load

Serverless application development is still an emerging practice, so I’m hoping you’ll benefit from k9’s experience reports.

Feel free to check out how k9 works and send me questions – I’d love to answer them!



(1) k9’s access analysis limits went from 300 principals and 200 buckets to 3,000 principals & individual resources such as 500 buckets + KMS keys; this is a big deal because this problem scales closer to O(num_principals*num_resources) than O(num_principals)

(2) “pay just for the data & compute resources you need” works really well for k9’s architecture & delivery model, and there’s still plenty of cost we can take out of CloudWatch Logs 🙂

(3) particularly when building a bootstrapped sustainable SaaS targeting SMBs using DevOps instead of VC-backed targeting Enterprise Security