As I’ve been drafting a strategy to solve the AWS Lambda timeout limitations I felt I’m getting closer to solving an issue which keeps me away from sleeping well sometimes. True that there are known patterns for reaching higher scalability using SQS though the problems at hand were not easy to solve with either of those when the AWS Lambda is used as a core compute service.
During the same period of me focusing on the timeout limitations issue, AWS raised the limit to 15 minutes. Although this is surely a useful change, I still preferred to polish my draft strategy of using a non-lambda compute which will not have a limit at all. When the safety-net-compute is also based on a per-demand pricing model, this forms a real solution to the problem. In reality, sometimes it’s impossible to predict load up front and maxing out limits is not the ultimate solution.
Here’s a short overview of the most important improvements I’ve made between draft implementation and current one:
event
and context
from primary handler have been moved to environment variables passed through ContainerOverride API because information from original JSON objects was getting dropped during the process of executing runTask. This is also an improvement in terms of consistency of managing environment variables.event
structures has been added. Appears that an SNS event changes from a bucket to bucket depending on the way files are managed within bucket.Skeleton of the end implementation can be seen in this repository.
So how does the current implementation look like after the before-mentioned improvements?
From a bird’s-eye view, structure and main ideas are the same:
immortal-aws-lambda
├── container
│ ├── Dockerfile
│ ├── package.json
│ ├── README.md
│ └── runner.js
└── serverless
├── package.json
├── README.md
├── serverless.yml
├── src
│ ├── events
│ │ └── onFailure.js
│ └── lib
│ ├── extractors.js
│ ├── getHandlerData.js
│ └── snsTopicToHandlerMap.js
└── webpack.config.js
5 directories, 12 files
Probably you’ll want to create the ECS task in AWS console first and take some settings you’ll need for the serverless service.
The sole goal of this service is to put a runner.js
script in a container and run it remotely from the dead letter queue service.
The contents of the script is actually ultra-thin and is comprised of 3 main steps:
All settings are dynamic variables.
And the source is ultra-simple:
#!/usr/bin/env node
const path = require("path");
const https = require("https");
const AWS = require("aws-sdk");
const promisePipe = require("promisepipe");
const unzip = require("unzipper");
const runner = async () => {
const {
REGION,
AWS_LAMBDA_HANDLER_EVENT,
AWS_LAMBDA_HANDLER_CONTEXT,
AWS_LAMBDA_HANDLER_NAME,
AWS_LAMBDA_HANDLER_PATH,
} = process.env;
try {
const event = JSON.parse(AWS_LAMBDA_HANDLER_EVENT);
const context = JSON.parse(AWS_LAMBDA_HANDLER_CONTEXT);
const lambda = new AWS.Lambda({ region: REGION });
const lambdaInfo = await lambda
.getFunction({ FunctionName: AWS_LAMBDA_HANDLER_NAME })
.promise();
const sourceCodeSignedUrl = lambdaInfo.Code.Location;
return https.get(sourceCodeSignedUrl, async (res) => {
// Download source from cloud and extract it at the current directory at the same time.
await promisePipe(res, unzip.Extract({ path: __dirname }));
const pathToHandler = path.resolve(
`${__dirname}/${AWS_LAMBDA_HANDLER_PATH}`,
);
// eslint-disable-next-line
const handler = require(pathToHandler);
// Merge environment variables.
process.env = Object.assign(
{},
process.env,
lambdaInfo.Configuration.Environment.Variables,
);
const result = await handler.handler(event, context);
return console.log(result);
});
} catch (err) {
return console.error(err.message);
}
};
runner();
This is the serverless service to deploy. It’s as simple and independent as it could be:
LambdaFailureQueue
to which others can push messages when failing.iamRoleStatements
, events
subscriptions and settings remain the same as before.Helpers in lib
are your responsibility to implement as event
structures in your case will be different. (most probably)
Still, the main point of having this service is still to run an ECS task starting a container:
const runParams = {
taskDefinition: RUNNER,
launchType: "FARGATE",
networkConfiguration: {
awsvpcConfiguration: {
assignPublicIp: "ENABLED",
subnets: [SUBNET],
},
},
overrides: {
containerOverrides: [
{
environment: [
{
name: "AWS_LAMBDA_HANDLER_EVENT",
value: JSON.stringify(initialMessage),
},
{
name: "AWS_LAMBDA_HANDLER_CONTEXT",
value: JSON.stringify(context),
},
{
name: "AWS_LAMBDA_HANDLER_NAME",
value: handlerData.name,
},
{
name: "AWS_LAMBDA_HANDLER_PATH",
value: handlerData.path,
},
],
name: RUNNER,
},
],
},
};
await ecs.runTask(runParams).promise();
Don’t forget to take these settings from the AWS Console and set them in your serverless.yaml
configuration file.
“Attaching” other serverless services and handlers to this workflow boils down the following:
iamRoleStatements:
# Allow queueing messages to the DLQ https://docs.aws.amazon.com/lambda/latest/dg/dlq.html
- Effect: "Allow"
Action:
- sqs:SendMessage
Resource: "*"
Resources
sectionresources:
Resources:
fooFunction:
Type: "AWS::Lambda::Function"
Properties:
DeadLetterConfig:
TargetArn:
Fn::ImportValue: immortal-aws-lambda:LambdaFailureQueue
This is because the serverless framework does not yet support onError
properly. Thanks to Siva Kommuri for suggesting this workaround.
Now, when your service fails, the error will be queued to the dead letter queue provided by the immortal aws lambda service, the immortal service will take this message, find the right handler and call it via the container service.
My path to finding this solution was not easy.
The tools involved are having rough edges.
Also, the process of triggering and reproducing failures because of a timeout, rebuilding the container, etc. is a lenghtly procedure on each iteration. For instance, every time something fails because of a missing character or spelling mistake, I needed to redeploy the non-bundles and non-optimized code of the lambda function to the cloud in order to get merely adequate error message for debugging in the logs of the ECS. (crazy!)
Working with streams and promises in Node is still very painful and hard to debug by the way …
So I hope that having these very thin layers of variables which communicate to each other will be a feasible solution for solving the timeout limitations in AWS Lambda for months ahead.