Deleting Failed MultiPart Uploads on S3

Not long ago, I wrote about “Creating MultiPart Uploads on S3” and the focus of the post was on the happy path without covering failed or aborted uploads. It was already long as it was so I decided to write a separate entry to discuss in detail how to clean up your buckets so you don’t incur in unnecessary storage costs.

What’s this all about?

Let’s review the basics: S3 allows you to store objects in exchange for a storage fee. Simple enough, however when we think of objects in the context of S3, most people assume the output of running a list-objects (or ls) operation or just looking at their buckets through the console (which performs the same API call). In said situations, parts of an object created through a multipart upload won’t show up but the service is still storing them for you which means you are paying for that storage.

If none of this surprises you, then this post might not be for you. However, if you’ve been doing multipart uploads for a while or you’re just new to it, I’d recommend to keep reading as you might find you could optimize your storage costs.

Let’s pick up where we left off

I’ll continue with the setup from our previous post, a bucket with a single 100MB file.

This is what list-objects has to say about it.

{
    "Contents": [
        {
            "Key": "large_file",
            "LastModified": "",
            "ETag": "",
            "Size": 104857600,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "",
                "ID": ""
            }
        }
    ]
}

So now, I’ll create a new multipart upload (I’ll be reusing the same file) but to simulate failure or an aborted operation, only the first part will be uploaded.

Let’s have a look at what list-objects has to say about it now.

{
    "Contents": [
        {
            "Key": "large_file",
            "LastModified": "",
            "ETag": "",
            "Size": 104857600,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "",
                "ID": ""
            }
        }
    ]
}

It is the same output as before, however if we list-parts for this particular upload we can see how we’re using an extra 25MB from our first part.

> aws s3api list-parts –bucket your-bucket-name –key your_large_file –upload-id UploadId

{
    "Parts": [
        {
            "PartNumber": 1,
            "LastModified": "",
            "ETag": "",
            "Size": 26214400
        }
    ],
    "Initiator": {
        "ID": "",
        "DisplayName": ""
    },
    "Owner": {
        "DisplayName": "",
        "ID": ""
    },
    "StorageClass": "STANDARD"
}

As far as I’m aware, the only native way (as in not wrangling scripts or 3rd party tools) to get the entire size of the bucket is through CloudWatch metrics. You can see how the total size of my bucket is correctly represented at 125MB.

So where do we go from here? Deleting unneeded parts sounds like the path forward.

S3 provides you with an API to abort multipart uploads and this is probably the go-to approach when you know an upload failed and have access to the required information to abort it.

The command to execute in this situation looks something like this

> aws s3api abort-multipart-upload –bucket your-bucket-name –key your_large_file –upload-id UploadId

However, this is not a very scalable way of controlling orphan parts, across multiple uploads and buckets. You could craft a couple of scripts (using the list-multipart-uploads command) that run on a schedule to check for those file or you can setup a lifecycle policy on your buckets to clean failed uploads.

Luckily for us, S3 makes this easy to set up. Head onto the management settings for your bucket and create a new Lifecycle Rule.

First of all give it a name and then define what the scope of the policy will be. Your options are to apply to the entire bucket or a specific prefix (for example “/uploads”). In my case, I’ll set it up across the entire bucket and the service will rightfully lets me know about it.

Next up is defining what do we want this rule to do. As you can see, there’s already a predefined option for incomplete multipart uploads.

 

And finally, configure the parameters for this action. Remember, S3 doesn’t know if you upload failed which is why the wording (and behavior!) is around incomplete uploads. As such, it is entirely up to you how soon after they were created you want to delete parts.

Leave a Reply

Your email address will not be published. Required fields are marked *