What?
One of the first projects I took on in my new workplace was to improve the release confidence. One of the areas was to increase our automation testing. I have strong opinions that integration tests offer the most bang for buck.
A good candidate to start with was the outbound call campaign manager - the system that takes a file of leads, processes them through a queue, and dials calls. It had no integration tests. The AWS services it depended on were never tested together end to end.
The team had already been in conversations with TestVagrant, a testing company. They had suggested taking on three SDETs for a six-month engagement at a cost of 12 lakhs per month. The quote was already on the table when I got introduced to the project.
The scope was very vague. When working with vendors, or being a vendor, I have learned that vague scopes lead to lot of pain during engagements. So I wanted to understand what we actually needed.
I ended up building what was required across 5 sessions of roughly two hours each, and helped save the company around INR 72 lakhs of otherwise unnecessary engagement.
The System
The Outbound Campaign Manager (OCM) is a Django application that manages outbound voice, SMS, and email campaigns. The flow for a voice campaign looks like this:
Upload CSV -> File lands in S3 -> Lambda processes file and creates leads -> Batch scheduled via EventBridge -> At schedule time, leads queued in SQS -> Lambda picks leads off queue and triggers calls
Four AWS services all talking to each other. Testing this against real AWS on every commit is expensive, slow, and flaky. The alternative is mocking everything, which tests nothing real.
Why LocalStack
The flow I needed to test required three services to interact:
- Create SQS queues
- Create EventBridge rules with targets pointing at Lambda
- Register Lambda event-source mappings on those queues
Three options on the table: pure boto3 mocking, Moto, and LocalStack.
Pure boto3 mocking
The simplest option. You patch the boto3 calls with unittest.mock and assert your code called the right methods with the right arguments. The problem: you’re testing that your code made the right API calls, not that anything actually happened. When your code calls sqs.create_queue(), the mock returns success - but there’s no queue. When the next step tries to put a message on that queue, the mock returns success again. You’ve tested the calls in isolation. You haven’t tested that the flow works.
For a single service this is sometimes fine. For a flow where SQS, EventBridge, and Lambda need to actually be wired together - queue exists, rule exists, target points at the right Lambda - mocking is useless. The interaction between services is exactly what I needed to verify.
Moto
A step up. Moto intercepts boto3 calls and maintains in-memory state per service. So sqs.create_queue() actually creates a queue you can later send_message to. For single-service tests this is solid.
The problem is cross-service interactions. EventBridge rules with Lambda targets, SQS event source mappings - Moto either doesn’t support these or supports them partially. The state exists per service but the services don’t talk to each other. You can assert the rule was created, but you can’t assert the Lambda gets invoked when a message lands on the queue.
LocalStack
Runs actual service implementations in Docker. The services talk to each other the same way they do in AWS. Real state, real cross-service wiring, real failure modes.
So I selected LocalStack.
Setup
Because I was working with a new repo, neither I nor Claude had much context to begin with. This is what I had in mind before I began
Architecture context -> Flow doc -> Test plan -> Implementation
Before Claude could write anything useful, it needed to know the data flow, the DB schema, how SQS queues are named, how EventBridge rules are structured, which services exist and where they live in the codebase. With some basic prompting, we created an architecture doc that I got reviewed by the service owner. An alternative is to get your repo indexed by DeepWiki
Second, a flow document which mapped the entire user journey step by step - frontend action on the left, backend API call on the right, DB records created at each step, AWS resources provisioned at approval. Claude read the frontend hooks, the backend views, and the models and produced a single document that showed exactly how the steps fit together.
Third, a test plan. Before writing any test code, I asked Claude to produce a plan for the batch creation test - what to assert at each step, which fixtures were needed. Only after reviewing and approving the plan did I ask it to write the actual test.
Because I went with this, each step made the next one more accurate. Or at least I felt I was in control :D
Writing the test
This was all Claude. Based on the previous documents generated, it wrote tests, ran them, fixed failures and reran. This cycle happened around 8 times. I didn’t run it autonomously because I didn’t want Claude touching any files outside the test directory.
Next time, I will configure allowedPaths in the Claude settings.json to restrict which files it can write.
The Numbers
- 4 AWS services integrated: S3, SQS, EventBridge, Lambda
- 2 integration test flows: batch creation and batch execution
- ~10 hours of active development time with Claude across 5 sessions
- 72 lakhs saved by dropping the TestVagrant engagement
Leadership Takeaway
It’s always important to decide on the right scope. AI hasn’t changed this.
We are using AI to make our teams more efficient. If that’s true for us, it’s true for everyone. The same expectation applies to vendors. I also have a strong opinion that developers should be writing tests, and this shouldn’t be offloaded to another individual either within the team or to an external vendor. I think testing-only vendors are going to have a hard time.
I as a Director managing a 27 people Org built these tests. That’s not a detail I usually lead with. But it matters here. The people best positioned to leverage AI are the ones who understand the problem deeply, have the context, and can judge the output. Orgs are getting flatter. That is already happening. The obvious move for managers and leaders is to pick up some of the work themselves. Not to replace engineers, but because the cost of doing it yourself has dropped dramatically.