Overengineering Security for Fun and no Profit

Confronted with lack of budget for proper security tooling, I decided to just create my own. We occasionally ran Prowler manually. The results were mostly ignored. There was no ownership tracking, no history, no breakdown by account or team, and no good way to correlate findings over time.

Basically the standard “we technically do security scans” setup.

So I built something myself.

The actual application itself is honestly not that interesting. It’s mostly a CRUD app written in Go and HTMX. It handles the usual things.

Launch and schedule scans
Store and display findings
Track ownership
Send Slack notifications
Group findings by team/account/repository
Show some basic graphs

The interesting part was the security model around the scanners themselves. Because the more I worked on it, the more I realized something uncomfortable: the scanners were actually some of the highest risk components in the entire environment.

Think about what these tools normally get:

Unrestricted read access to AWS accounts
Unrestricted read access to repositories
Full internet access
CI/CD integration
Permissions to pull arbitrary images and dependencies

That is an absurd amount of trust.

You don’t even have to imagine the risk here. This is not just theoretical paranoia. Trivy, a well known open source scanner we actually use, was the victim of supply chain attack just two months ago. Running a compromised Trivy in our setup would have potentially given attackers access to all of our AWS accounts and all of our repos. Not to mention direct information about security gaps that Trivy itself just discovered.

Supply chain attacks are all over the news these days. But they are not even the only potential issue with giving so much access to an external tool. Anything from a simple logging bug to outright malicious behavior could expose sensitive data. The entire source code could be sent to an external server, or someone might decide my AWS configuration is very useful to train their LLM.

So I started designing the entire system assuming the scanners themselves were hostile.

Separate account

The entire scanning platform runs inside its own AWS account with basically nothing else in it.

I did not want random applications, CI jobs, unrelated infrastructure, or even engineers experimenting with permissions accidentally gaining the ability to assume scanning roles in production accounts.

The account exists mostly for isolation and blast radius reduction.

The orchestrator has almost no permissions

One of the core design decisions was that the main Go application itself should not have direct access to anything sensitive. No broad AWS access. No GitHub access. No unrestricted internet access. No ability to directly assume audit roles.

I really did not want a normal web application becoming the gateway to:

Every AWS account
Every repository
Every scanner
Every secret

Instead I used a credential broker pattern. The Go app only has permission to invoke a Lambda. And that Lambda:

Assumes a narrowly scoped audit role
Launches a Fargate task
Forwards temporary credentials into the container
Generates tightly scoped presigned S3 URLs for result uploads

The scanners themselves run with zero AWS permissions attached to the task role. If a container gets compromised, there are no standing credentials available to steal. Whatever is stolen is short lived and as scoped as possible. It is read-only, with short TTL, only has access to one account at a time, and is restricted to our IP with aws:SourceIP condition. Even if the credentials are maliciously extracted, they will be denied. Any exploit would need to actively use those credentials from inside the scanner itself.

The scanners do not get GitHub access either

The code scanners were handled similarly. I did not want TruffleHog or Trivy containers holding GitHub tokens. So the scanners themselves never received repository access directly. Instead:

The broker downloads the latest commit snapshot
Uploads it to S3
Generates one presigned URL to download the code and one to upload results
Hands those URLs to the scanner container

The scanner only receives a temporary archive of the exact repository snapshot being scanned. The scanner has zero access to unrelated repositories or organization metadata.

Again, probably excessive.

But I wanted the scanners to have access only to the exact data required for a single scan and nothing else.

Credentials are encrypted in-flight too

The temporary AWS credentials forwarded into Prowler containers are encrypted before being passed to the scanner tasks. The credentials are already short lived, but I still did not want them:

Accidentally logged
Visible in task metadata
Exposed through debugging output
Or floating around plaintext anywhere unnecessarily

Could I probably have gotten away without this? Honestly yes. But at that point I was having too much fun going overboard.

Upload-only result handling

Even the upload path is heavily restricted. I already mentioned the broker Lambda generates presigned S3 URLs for scan uploads. Those URLs are further constrained by:

Content type
File size as another layer of DoS prevention
Short expiration time
Upload-only permissions

The scanner containers cannot:

List buckets
Read previous results
Overwrite unrelated files

They can only upload their own scan output to a single location for a limited period of time.

Network isolation

Prowler was especially difficult to fully contain because it legitimately needs internet access to communicate with basically all AWS APIs. Some APIs support VPC endpoints, but not all. And the cost of endpoints adds up real quick too.

The easy solution would have been running Prowler in a public subnet and calling it a day. I really did not want to do that. Instead the scanners run inside isolated subnets with outbound traffic forced through a Squid proxy. The proxy only allows connections to AWS API domains. That immediately blocked a huge amount of potential abuse if a scanner image or dependency ever became malicious.

But there was another problem. AWS EC2 public DNS names like ec2-X-X-X-X.compute-1.amazonaws.com can effectively be used as arbitrary internet relays while still technically matching AWS-owned domains. So those got explicitly blocked too.

			
# Allowed: any AWS API endpoint under amazonaws.com.
# `dstdomain` with a leading `.` does suffix subdomain matching, so this
# matches sts.us-east-1.amazonaws.com and ec2.eu-west-2.amazonaws.com
# but NOT the bare amazonaws.com apex.
acl aws_apis dstdomain .amazonaws.com
# Denied: attacker-deployable hostnames that are technically *.amazonaws.com.
# These all sit under amazonaws.com but resolve to user-controllable
# infrastructure (someone's EC2 instance, someone's API Gateway, etc.).
# Order matters: deny rules evaluate before the allow rule below.
#
# EC2 public DNS: ec2-X-X-X-X.compute-1.amazonaws.com (us-east-1) and
# ec2-X-X-X-X.<region>.compute.amazonaws.com (other regions).
acl deny_compute   dstdom_regex \.compute(-[0-9]+)?\.amazonaws\.com$
# API Gateway default hostnames (anyone can deploy free-tier API GW):
# <api-id>.execute-api.<region>.amazonaws.com.
acl deny_apigw     dstdom_regex \.execute-api\..+\.amazonaws\.com$
# AppSync default hostnames: <api-id>.appsync-api.<region>.amazonaws.com.
acl deny_appsync   dstdom_regex \.appsync-api\..+\.amazonaws\.com$
# Public ELB / NLB / ALB hostnames: <name>-<id>.<region>.elb.amazonaws.com.
acl deny_elb       dstdom_regex \.elb\.amazonaws\.com$
# S3 static-website endpoints: <bucket>.s3-website-<region>.amazonaws.com
# and <bucket>.s3-website.<region>.amazonaws.com.
acl deny_s3website dstdom_regex \.s3-website[.\-].+\.amazonaws\.com$
# Block the known escape routes BEFORE the broad amazonaws.com allow.
http_access deny deny_compute
http_access deny deny_apigw
http_access deny deny_appsync
http_access deny deny_elb
http_access deny deny_s3website
# Allow CONNECT to the AWS API control plane.
http_access allow aws_apis

		

At that point the scanners could talk to AWS APIs, upload results, pull allowed artifacts, and not much else.

VPC endpoints as a second containment layer

The outbound proxy was only one part of the containment model. I also treated VPC endpoint policies as a completely separate anti-exfiltration control layer. S3 and ECR access went through tightly scoped VPC endpoint policies restricted to principals from the scanning account using conditions like:

aws:ResourceAccount
aws:PrincipalOrgID

Prowler was the main exception because it legitimately needed broader read-only metadata access across the organization. So the endpoint policies had a separate allow path for Prowler audit roles while still explicitly denying direct object access outside of the allowed paths.

The important part was that the controls were layered. Even if a scanner became malicious, the proxy restrictions failed, outbound filtering was bypassed, or credentials leaked somehow, the VPC endpoint policies still heavily constrained what the containers could actually access or exfiltrate.

Scanner update paranoia

Security scanners are awkward because keeping them updated is important, but automatically trusting fresh images from the internet is also terrifying these days. So scanner images were updated on a cooldown schedule instead of continuously. I think I settled on roughly five days. Long enough to avoid immediately ingesting a compromised release, but short enough to still get updates reasonably quickly.

Trivy presented another challenge because its vulnerability database needs to stay current. So instead of allowing live database downloads from inside runtime containers, the database was periodically prebaked into the image during scheduled builds.

The runtime scanners themselves stayed heavily isolated.

This was absolutely overkill

None of this was strictly necessary. The company probably would have been fine with:

A cron job
Unrestricted scanners
A few IAM roles
Dumping results into S3

And honestly that would have covered 95% of the practical security value. But I had fun building it. It also ended up becoming one of the tightest locked down systems I’ve ever worked on:

Isolated account
Brokered credentials
No standing scanner permissions
No GitHub access from scanners
Encrypted temporary credentials
Isolated subnets
Restricted outbound traffic
VPC endpoint policy restrictions
Upload-only result paths
Controlled image update cadence

Final thought

One thing I realized during this project is that security tooling often quietly becomes some of the most privileged infrastructure in a company. Scanners tend to:

See everything
Read everything
Connect everywhere
Pull code from the internet
Process untrusted input constantly

Yet they are often deployed with almost no containment because “it’s the security tool”.

That felt backwards to me.

So I designed the system assuming the scanners would eventually become hostile. Any exploit would need to be specifically tailored to our environment instead of working generically against every deployment. That greatly reduces the blast radius.

Thankfully they haven’t turned hostile yet.

kichik's blog

Helpful infrastructure software tips and tools

Leave a comment Cancel reply