Stateful AWS lambdas part3: bootstrapping like an EC2, reusing resources and node_modules in /tmp

This one falls under the category of “you should probably never do this”, but is also an interesting case study to understand better how lambda works, and how much of the infrastructure powering our lambdas actually get reused between invocations.

We already know from many clues (i.e. role permissions for certain things require the “ec2:” prefix) that our lambda hosts are EC2s, but functions probably run inside ECS containers atop EC2 clusters. We have no control over any of this, Amazon guaranteeing only that our code will execute when it is triggered, everything else fading away between those invocations.

The application

Lambda has pretty tight limits as to how much code can be uploaded into a zip/jar file (50 MB), so let’s suppose we have a very large npm dependency (i.e. phantomjs) that we don’t want to bundle and upload every time we update our function.

Could we download the file to disk, say /tmp (of which we have 512MB) and only re-download it when our container gets switched over / cleared / disappears for whatever reason? How often would that happen?

Control flow

  • Check if our node_modules in /tmp already exists
  • If it doesn’t, download the .zip package, unzip it on disk and keep going
  • If it does, just keep going

The actual code

For this example, we’ll use a zip package of lodash, but this could be anything. Make sure to change the URL to download. To keep things fast, put your file on S3 in the same region, odds are you’ll be hitting a server not too far from your lambda.

var http = require('http');
var fs = require('fs');
var _;
var spawn = require('child_process').spawn;
process.env['NODE_PATH'] = process.env['NODE_PATH'] + ':/tmp';

var lodashUrl = 'http://s3.amazonaws.com/your-bucket-name/lodash.zip'; //change this!
var lodashLocation = '/tmp/lodash.zip';
var nodeModulesLocation = '/tmp';

function doNormalStuff(callback) {
    _ = require(nodeModulesLocation+'/lodash');
    console.log(_.filter([]));
    return callback(null, 'done');
}

exports.handler = function (event, context, callback) {

    if (!fs.existsSync(nodeModulesLocation)){
        fs.mkdirSync(nodeModulesLocation); // if our tmp folder does not exist, create it
    }

    function unzip() { // unzip the downloaded file
        return new Promise(function (resolve) {
            console.log('Unzipping file');
            const unzip = spawn('unzip', ['-q', '-o', lodashLocation, '-d', nodeModulesLocation]);
            unzip.on('close', (code) => {
                console.log(`child process exited with code ${code}`);
                resolve();
            });
        })
    }

    function download(url, dest) { // download the file
        return new Promise(function (resolve, reject) {
            var file = fs.createWriteStream(dest);
            http.get(url, function (response) {
                response.pipe(file);
                file.on('finish', function () {
                    console.log('Download done.');
                    file.close(resolve);
                });
            }).on('error', reject);
        }).catch(console.log)
    }

    // our main control function
    if (!fs.existsSync(lodashLocation)) {
        console.log('File does not exist, downloading...');
        download(lodashUrl, lodashLocation)
            .then(() => console.log('Download has finished, unzipping.'))
            .then(() => unzip())
            .then(() => doNormalStuff(callback))
    }else{
        return doNormalStuff(callback);
    }
};

The results

Now the interesting part…how often did we need to re-download our bundle? Would this actually be a usable strategy to bootstrap a lambda?

I’ve setup a Cloudwatch event trigger each minute to run this function, and let it go for a few hours.

  • Invocations: 133
  • File re-downloaded, according to the logs: 5

So this ain’t bad – we only had to re-fetch the file 3-4% of the time, meaning that our container was stable for long periods of time, keeping its disk state the same.

Obviously, there are zero guarantees as to whether this would be the same for any function, but it gives interesting insight into how often state changes behind the scenes.

Stateful AWS lambdas part1: polling SQS

Lambdas are presented as a stateless product, but there are a few good reasons (and many more bad ones) to get around this and do more than a single operation on each invocation.

Suppose you want to poll an SQS queue, but only for 5 minutes during a day? You could launch an EC2 on a schedule, but bootstrapping will be slow and you’ll still pay for a full hour, which is not great. Lambda is meant to be event-based (i.e. something triggers your lambda), so polling can be a bit tricky.

Here’s a pretty easy way to get around this limitation, the gist of it is:

On the AWS side

  • set your lambda timeout to the maximum time you’ll need to run your code
  • create a cron-like expression to run your job as (not) ofter as you like
  • make sure your lambda role has permission to poll from SQS

In your code

  • keep a timer running and fire off your callback before your lambda timeout
  • during that time, run your code as usual
var exitTimeout = false;
var lambdaTimeout = 300000;
var AWS = require('aws-sdk');
var sqs = new AWS.SQS({'region': 'us-east-1'});
var queueUrl = 'https://sqs.us-east-1.amazonaws.com/account-id-probably/your-queue-name';

function main(callback) {
    exitTimeout = false;

    setTimeout(function () {
        exitTimeout = true;
    }, lambdaTimeout - (1000 * 25)); // set our timeout to 2 minutes, minus 25 seconds (to be sure, and leave time to finish whatever work the loop was doing)

    function poll() {
        if (exitTimeout === true) {
            console.log('Out of time, exiting');
            return callback(null);
        } else {
            console.log('Still got time...polling the queue');
        }

        return sqs.receiveMessage({'QueueUrl': queueUrl}).promise()
            .then(result => {
                if (!result.Messages) return []; // no messages were found in the queue, return an empty array of messages to delete
                var deleteOperations = []; // this is where we store the messages to delete
                result.Messages.forEach(message => {
                    // do some interesting processing here!
                    console.log(`MessageId: ${message.MessageId}`);

                    // line up all messages to be deleted
                    deleteOperations.push(sqs.deleteMessage({
                        QueueUrl: queueUrl,
                        ReceiptHandle: message.ReceiptHandle
                    }).promise())
                });
                return deleteOperations;

            })
            .then((data) => Promise.all(data)) // delete all messages at once
            .then(() => {
                console.log('Waiting a bit before polling again');
                setTimeout(function () {
                    return poll(); // call ourselves again to keep polling the queue
                }, 500);
            })
            .catch(console.log);
    }

    poll();
}

exports.handler = function (event, context, callback) {
    return main(callback);
};

A few notes on the code above:

  • lambdaTimeout should match (in milliseconds) the timeout value on your lambda
  • I’ve decided in this example to exit at lambdaTimeout – 25 seconds, since long polling on SQS was set to 20 seconds on my queue (giving my lambda 5 seconds to process the data and delete the message). This is completely arbitrary though, and depends on the “Receive Message Wait Time” you have set on your queue, and the work you intend to do on each message.
  • Notice how we recall poll() at the end of the promise chain – this function is recursive
  • There is a timeout of 500ms before calling poll(). It is not strictly necessary, but for some types of work is a good idea to avoid throttling.
  • Don’t forget that if your lambda invocation fails, it may retry. Depending on what you’re doing in the poll() loop, you may want to ignore/silence those and keep going (notice how nothing rejects() in this code, but only catches and logs the error).

Listing and getting S3 files from an AWS node 4.3 lambda

Here’s how you can efficiently list all files in a bucket, and then download them all at once. If you use the node4.3 runtime on lambda, you can use promises natively. Beware, Promise.all will attempt to download all the files at once. If you have many files in your bucket, this could cause problems (i.e. throttling).

Don’t forget to give your lambda a role that has permission to listObjects and getObject

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

var bucketName = 'myBucketName';

function main(callback) {
    s3.listObjects({
        Bucket: bucketName,
        Delimiter: '/'
    }).promise()
        .then(data => {
            return data.Contents.map((object) => {
                return s3.getObject({
                    Bucket: bucketName,
                    Key: object.Key
                }).promise();
            });
        })
        .then((data) => Promise.all(data))
        .then(console.log)
        .then(callback)
        .catch(console.log)

}

exports.handler = function(event, context, callback){
    return main(callback);
};