The story of my (possibly) denial-of-wallet attack on AWS Serverless🕵🏻♂️
...if it was an attack at all. might have been an innocent bot getting stuck in the muddy waters of IP-land
That's how my conversation in the Discord group ended on the topic. Long story short, today I went onto my personal website heyitschris.com which was built during the Cloud Resume Challenge by Forrest Brazeal. One of the requirements of the challenge was a visitor counter for the site. As I stumbled onto my beautiful baby today however, I was presented with a rather nifty number at the bottom of my page, where the number of visitors is being kept track of. Down below is how I felt because for a split-second, I thought I was famous.
When that one-second rush of
what is happening settled down, I knew it was either a giant glitch in the system, or somebody just did a denial-of-wallet attack on me. When I remembered I have billing alerts set on my main account and I did not get any emails about charges, I became even more itchy. That is not a number that you'd like to see your Lambda function being invoked at, however cheap Lambda is. I went straight to work, first checking my billing console of course. I was presented with a very nice nada, zero, nothing, nichts in charges. At this point the craze went waaaaay down and I could investigate with curiousity instead of panic, a very important distinction.
Yeah about that $0, it technically cost me $0.74 because of all the DynamoDB operations. The
visitorCount value is being stored in a DynamoDB table that has a
PAY_PER_REQUEST model. If anything, this little hiccup proved that on-demand DynamoDB tables can scale from one to 20K requests a second in a matter of minutes. You'll see why below...
Next step in our investagation 🕵🏻♂️
The second place to look at was API Gateway since that was the connector between the internet and my Lambda function. Unfortunately I didn't set up all logging well enough and my gateway resource only stored the logs for 15 days, which was just shy of the date I needed😭 Therefore I could not see if my API Gateway was involved in the accident or it was completely isolated.
The next step was to figure out if this whole thing even involved the whole website or just the visitor counter API, to which the answer was it did not. In my CloudFront metrics console I could luckily backtrack to the day (14.07.2020) and it showed me a very graceful 20-25 requests per 5 minute period. This - I now realize - was still unusual, but nothing a couple crawler bots couldn't explain. Next step, Lambda🥵
Between 3AM and 6AM on the 14th of July 2020, there have been hundreds of thousands of invocations on my visitor counter Lambda function. This function is very simple, all it does is
update_item in that DynamoDB table on a single item, nothing rocket-sciency. Again, very much amazed that my Lambda function could scale to 20K invocations from zero without a problem as well. Since the first 1 million Lambda invocations (basically) per month are completely free, this did not cost me a single dime, which I was very happy about of course.
Nobody needs Pagerduty for Serverless, it lets you sleep.
So what happened? 🤯
Honestly, I have zero idea. Since I can't go back to my API Gateway logs, I can't say for sure if this was something from the outside or the inside. I did check my CloudTrail logs thoroughly and there has been no intrustion into my accounts. Besides, if an attacked would get their hands on a pair of access keys, their first thought wouldn't be "lets run this guy's Lambda function 594037 times!". Nothing else has been affected in that or any other accounts either.
My best bet here would be a poor bot that got stuck on my API endpoint somehow. I know I can't prove this, but this is my hypothesis. It wouldn't make sense that somebody cooked up a script to do a denial-of-wallet attack me, because if somebody is that smart, they would have known that it won't cost me a dime (yeah okay 74cents). Maybe they were trying to exhaust my DynamoDB table and make it autoscale? That's also a possibility, however pretty narrow.
This was a very cheap lesson 🎓
There are some things I've learned here however:
- In an on-demand DynamoDB table, 590K operations with around 10WCUs consumed will cost ~$0.74
- DynamoDB on-demand tables scale very well
- Serverless is cheap even when hit with lots of traffic traffic at once
- Serverless is perfect for unpredictable load spikes and it scales without anybody noticing
- I have to work on my logging skills, like for real 🤦🏻♂️
- Rate limiting on your resources is important. As I never expected this much traffic, I should have limited both my Lambda function and my API Gateways to only a small number of invocations at once. If this would have been a big website, production resources and an actual attack - rate limiting would have saved my butt (and probably have earned me a salary increase lol).
As always, if you know have any thoughts about this or wanna just hit me up, you can reach me at Twitter. I would honestly like to know what happened here, so if you have any idea please do hit me up😅
Cover photo by Külli Kittus on Unsplash
🚨 UPDATE 03.08.2020:
After this accident, I needed to implement (and learn about) rate limiting on the API Gateway. By default it is set by AWS at 10K requests per second, but that's a bit too much for me and this project. I set mine to 50 requests per second, which still might be a bit too much. Here's how to do it via the SAM/CF template.
ServerlessRestApiUsagePlan: Type: AWS::ApiGateway::UsagePlan Properties: ApiStages: - ApiId: !Ref ServerlessRestApi Stage: Prod Description: Usage plan of visitor counter API Throttle: BurstLimit: 100 RateLimit: 50 UsagePlanName: VisitorCounterLimiter
A better way to this - as Andrew Brown mentioned in the comments - would be setting up anomaly detection in CloudWatch and actively monitor / update the usage plans. This is a bit too much for this project however but I'm going to keep that in mind for future reference.
So this could be a good use case for CloudWatch Alarms with Anomaly Detection. Since Anomaly Detection uses a band instead of a set value so if something is really out of the ordinary you would get alerted and could automate remediation like a Honeypot.
Yep, that's definitely an option here. I'm gonna dive into logging, as right now that area of mine is a bit slippery😅
But yes an anomaly detection would perform perfectly here, this is what it was made for AFAIK. How could you remediate an API Gateway attack?🤔 Rate limiting would work but that cuts off normal users too for a while. Better yet, update the WAF rules🕵🏻♂️