This post will be a continuation of our post introducing the AWS Fault Injection Simulator.
The idea was to run an experiment and remediate our findings but as it turned out, the post was already too long with a simple setup so I split it in two parts.
I’d recommend you to check the first part to better understand the context of this entry, but the “tl;dr” is that we set up an experiment with FIS that would target for termination all EC2 instances of an application managed by Elastic Beanstalk. The beanstalk configuration has an autoscaling group with a minimum of 1 instance, which meant terminating it incurred on an outage.
On the application side, we need to make sure our environment runs on a minimum of 2 instances.
This is a good reminded that even though you make use of managed services, you’re still in charge of the behavior of it. Managed services (regardless of being compute, databases, containers, etc) will do all the heavy lifting but it will only operate in whichever way you tell it to. Our first FIS experiment showed that the application setup wasn’t resilient enough to failure. Whilst, beanstalk made sure to spin up a new instance to replace the terminated one, there was still a minute or two of downtime.
The New Experiment
Now that we’re running more instances, I’m also going to update the experiment template. On its current form, it would still target all instances because it was just based on tags which are shared by all ec2 resources managed by beanstalk.
The action can remain as is, that is a terminate-ec2 type. The change will be at the target level. Here, we need to update it in such a way that it targets a subset of the instances and FIS provides you with two options to do so.
Count: Fixed number of resources that will be targeted by the matching criteria
Percentage: Percentage of affected instances. NB: FIS will round down the resulting number of targets in cases where you have an odd number of resources.
I want to test how my application behaves if I lose half my fleet, so I’ll set it up with a Percent mode at 50%. In this particular case, this is the equivalent to choosing Countwith a value of 1.
After running this new experiment, we can test our application and see that there are no perceived changes to it. However, upon closer inspection to our resources, we’ll learn a few things
Our EC2 fleet downside to 1 (which means our action ran as intended)
Beanstalk is showing a Degraded state because 1 of the instances stopped sending data. If you remember, our application state was Unknown when the entire fleet disappeared.
We now have a new configuration to withstand certain types of failure and an experiment we can run on a regular basis to make sure our application configuration is up to it.
There are many more types of actions you can perform with FIS that we can explore in future entries.
Chaos Engineering has been around for a while, after being popularized by Netflix during their migration to the cloud. However, despite their best efforts to open source their tooling, a proper secure and reliable set up was complicated enough most people.
Fast forward to the AWS announcement of a limited preview new managed chaos engineering service called AWS Fault Injection Simulator at re:Invent 2020. After a couple of months of limited access, the service is now GA (us-east-1 only at the time of this post) and today’s post is about getting started with it.
There are a number of actions the service can perform (stop/terminate instances, throttle APIs, etc) against a number of different targets (EC2, ECS, RDS with more to come). For this entry, we’ll keep it simple and just focus on terminating a production EC2 instance experiment. In this particular case, I’ll be using the sample NodeJS application managed by Elastic Beanstalk.
As mentioned before, I’m just using the sample NodeJS application that Elastic Beanstalk offers you to quickly get started. However, I wanted to some of the configuration choices that I made to my environment.
The first bit of configuration (and one to pay attention to) is around the high availability for my environment. You’ll notice that while it is load balanced and scale up to 4 instances, the minimum has actually been set at 1.
You can also see the resources the service created for us, which in this case is one EC2 instance to which I’ve applied a resource tag at the application level. This tag is of the form chaos:ready, which it is descriptive enough for me to understand what instances I want FIS to target during its experiments. You could choose whatever value of the key value pair tag or just not have one altogether.
Finally, here’s what the sample application looks like and it also serves as a one to see how our environment is running.
From the FIS homepage, you’ll see your option is to create a new experiment template so go ahead and hit that button.
Disclaimer: FIS will execute whatever actions you define against your resources. The service doesn’t produce fake metrics or wizardy to simulate how a potential disruption affects your system. The service will indeed, terminate your instances if that’s the action you have chosen. You will be provided with a number of warning signs along the way but it’s better to be safe than sorry.
Think of the template as the definition for your experiments, the place in which you can specify actions, targets and alarms on top of the usual name, role (the role requires a trust relationship on ‘fis.amazonaws.com’) and tags that we’re used to from other AWS services. As previously mentioned, today’s experiment will only perform a terminate instance action.
When creating our action, we’re asked to provide a name for it as well as an action type from a predefined list. Once you’ve selected your action type, the Target dropdown will appear with an already prepopulated value created for you. The last option is something called “Start after“, what this means is in cases were a template has multiple actions, you might choose to run them in parallel or in sequence. Right now, it can be ignored given we’re going for the one action.
Now, let’s edit the target FIS created for us. I’ll start by updating the name for something a bit more descriptive, the Resource Type can stay as is because we’re indeed targeting EC2 instances. Now comes the fun part and arguably the area in which you need to focus the most which is how are we going to target these resources.
We see the selected method by default is using a resource ID. For our particular example, it might look like it’s enough and it indeed could be for a one off execution. It is true we’re only running one EC2 instance but we need to save the template with a fixed ID, so that means we’re not really in a position to reuse the template given that if we succeed and actually terminate the instance that particular ID will be lost.
So let’s use tags and filters and as soon as we select that method, a couple of “resource” options will appear. The first one is tags, and as you can imagine it will only run against resources with the specified tags. This will be the place in which I’ll use that chaos:ready tag from before.
The second option is called filters and I highly recommend you to follow the documentation link as this is the area where targets become truly powerful. For the sake of simplicity (this post is already too long) but not to leave you hanging, I’ll create one that targets only EC2 instances that are in a running state.
The Stop Condition section will provide you with the necessary safe guard to stop the experiment if a certain criteria is met. It is an optional value and I won’t be using it now but I’d suggest to always have one for serious experiments.
Go ahead finish the creation of the template. The service will make sure you’re sure about it with with a nice warning sign.
I’m now ready to start the template, which will in return create an experiment instance. The start process comes with the same warning as the creation one and it should run successfully.
Now that the experiment has finished, let’s have a look at the chaos it caused.
My beanstalk URL now returns an error, which means the underlying EC2 instance has been successfully terminated.
We can confirm our suspicions by looking at the health of our environment as well at the specific time in which it happened by looking at the metrics.
Beanstalk will automatically spin up a new instance and your environment will be back to healthy in a minute or two but it is a good reminded that even if you’re using a managed service, the service can only do what you tell it to do. In our case, because our minimum configuration was one instance, terminating it meant a complete disruption of our application.
Not long ago, I wrote about “Creating MultiPart Uploads on S3” and the focus of the post was on the happy path without covering failed or aborted uploads. It was already long as it was so I decided to write a separate entry to discuss in detail how to clean up your buckets so you don’t incur in unnecessary storage costs.
What’s this all about?
Let’s review the basics: S3 allows you to store objects in exchange for a storage fee. Simple enough, however when we think of objects in the context of S3, most people assume the output of running a list-objects (or ls) operation or just looking at their buckets through the console (which performs the same API call). In said situations, parts of an object created through a multipart upload won’t show up but the service is still storing them for you which means you are paying for that storage.
If none of this surprises you, then this post might not be for you. However, if you’ve been doing multipart uploads for a while or you’re just new to it, I’d recommend to keep reading as you might find you could optimize your storage costs.
Let’s pick up where we left off
I’ll continue with the setup from our previous post, a bucket with a single 100MB file.
As far as I’m aware, the only native way (as in not wrangling scripts or 3rd party tools) to get the entire size of the bucket is through CloudWatch metrics. You can see how the total size of my bucket is correctly represented at 125MB.
So where do we go from here? Deleting unneeded parts sounds like the path forward.
S3 provides you with an API to abort multipart uploads and this is probably the go-to approach when you know an upload failed and have access to the required information to abort it.
The command to execute in this situation looks something like this
However, this is not a very scalable way of controlling orphan parts, across multiple uploads and buckets. You could craft a couple of scripts (using the list-multipart-uploads command) that run on a schedule to check for those file or you can setup a lifecycle policy on your buckets to clean failed uploads.
Luckily for us, S3 makes this easy to set up. Head onto the management settings for your bucket and create a new LifecycleRule.
First of all give it a name and then define what the scope of the policy will be. Your options are to apply to the entire bucket or a specific prefix (for example “/uploads”). In my case, I’ll set it up across the entire bucket and the service will rightfully lets me know about it.
Next up is defining what do we want this rule to do. As you can see, there’s already a predefined option for incomplete multipart uploads.
And finally, configure the parameters for this action. Remember, S3 doesn’t know if you upload failed which is why the wording (and behavior!) is around incomplete uploads. As such, it is entirely up to you how soon after they were created you want to delete parts.
When you’re using S3, an object store that has unlimited volume of data and a maximum object size of up to 5TB (the maximum for a single PUT request is 5GB) you might be tempted to start uploaded some pretty big files.
So today’s focus is about making use of the multipart upload capabilities of S3 to speed up the amount of time that it takes for a large object to land on your buckets.
The “managed” way
The AWS CLI has a number of commands that will help you upload those large files by automatically making use of multipart and so chances are that if you have used the CLI to upload documents into your buckets you have come across them. Those commands are cp, mv and sync and they can be used as followed.
> aws s3 cp your_large_file s3://your-bucket/
> aws s3 mv your_large_file s3://your-bucket/
> aws s3 sync your_large_file s3://your-bucket/
The differences between the three is out of scope for this post, however I’ll finish by saying that you can still change their configuration in order to make better use of your bandwidth. You can set the new configuration values through the CLI or directly into your AWS profile. A list of all possible configuration values can be found here.
The “unmanaged” way
AWS will recommend you to use those commands when possible (and with good reason!) but there are cases in which they don’t fit the bill and you have to do a bit of plumbing yourself. Luckily you are not left alone and the AWS CLI still provide you with the necessary commands to achieve the same result.
So let’s go ahead and upload a large file in parts into our bucket. In my case, I’ll create a 100Mb test file from the command line like this
> truncate -s 100M large_file
Now, I’ll use the split command to get four 25M parts. Split is available on both Linux and OSX (however, the OSX version might out of date and you might need to install the GNU core utilities).
> split -b 25M large_file
If you list the files in your directory, it should look something like this
We are not ready to start interacting with S3!
The first step in the process is to actually create a multipart upload
The response from the API only contains three values, two of which have been provided by you. The last value is the UploadId and as you can imagine, this will be our reference to this multipart upload operation so go ahead and save it.
It is time to start uploading our part. The following is the command on how to upload a single part of which you’ll have to repeat N number of times depending on how many parts you’ve split your file into (In my case, N=4 and the command is for the first part), the values for part-number and body will need to be updated accordingly for every part you upload.
The ETag value that each upload-part returns will be used to complete the upload.
Once all parts are uploaded, you need to instruct S3 that the upload is completed. Remember S3 has no knowledge on how many parts there should be and what the references are so, passing that information back to it will complete the process. In order to do so, we need to compile a json array of all our parts and their respective ETag values.
You can use the ETag values that you have been collecting or retrieve them again by listing all parts in the upload
Save the output of the “Parts” array into a new file (I’ll call mine parts.json) and make sure to not include the LastModified and Size keys into the final file. Once you’re done the file should like something like this and remember that in my case, I was only dealing with four parts.
Following the theme of our previous post, “Static Website Hosting With CloudFormation“, I thought that before moving onto a radically different topic we could keep exploring alternative ways of reaching similar outcomes.
To no one’s surprise, there’s always multiple ways of doing the same thing on AWS. Options typically vary on pricing, functionality, ease of use, management overheard and more, so today we will be having a look at Amplify Console. The Amplify brand is all geared towards making web and mobile development as easy and streamlined as possible (the “umbrella” contains client side libraries, backend provisioning, hosting, CI/CD, CLI). So with Amplify Console we see their approach to hosting and CI/CD, in an easier and significantly less “hands-on” approach compared to having to roll out your deployment pipelines, buckets, SSL, etc.
Feel free to follow this step by step or use your own applications. In my case, I’ll be using the calculator app from the community built React applications that you can find on their site. The majority of the instructions would apply to any application with some exceptions needed for apps built using create-react-app.
Head over to the Amplify homepage and create a new app and choose “Host web app”
We’re now presented with a choice for our source code provider. In my case, I’ll be using CodeCommit but you can choose from the most common ones as well as manually uploading your application (which unless being a quick prototype, it kind of defeats the purpose). Depending on your provider of choice, you’ll need to accept and enable the various different permissions for the service to be able to read your repository as that’ll be needed in order to detect changes and pull the code.
The first bit of magic will come in after hitting the Next button. You might find there’s a slight delay and that’s because Amplify is looking into your repository in order to find build settings for your application or auto create one for you. Amplify will be able to detect and automatically create settings for most common setups. It’ll be able to see if you use NPM or YARN, React or Vue and even if you’re using a static site generator such as Hugo or Gatsby. You could be in luck and the auto generated settings don’t need any manual intervention but even if you need to change a value here or there, what’s presented to you will get you at least 80% of the way there.
After that, you’re presented with everything up to that point and you should be able to save your project. The service will bootstrap your application and it’ll start a new deployment, providing you with an unique URL (with SSL enabled) to access your site. There are also a number of settings on the left hand side menu such as domain management, notifications, access control (i.e setting a password for your site), monitoring, redirects and more. Let me know if you would like to dive deeper on those on future posts.
And that’s it! Your site should be up and running, a deployment pipeline is set up for you behind the scenes so any future changes that you make to the connected branch will trigger a new deployment.
A note on create-react-app
If you followed along with the same application or are using create-react-app for yours, you are going to see that things won’t work out of the box. I’ll list below the changes I made in order to have it running, however you could also make some adjustments within the actual build process to move files across.
The index.html is located under the “public” folder as opposed to the root of the repository so a redirect rule was needed there.
Not much needed to change here except where our base directory is located. The create-react-app build process will save all its files under the “build” folder so I just needed to adjust that to reflect that difference.
Finally, create-react-app uses the “homepage” value in your package.json to assume where your app lives. All I had to do there was setting it up as an empty value (“”).
Static site generators such as Hugo, Gatsby and Jekyll as well as front-end development practices of decoupling presentation and APIs have pushed static web hosting to primetime. One of the many benefits of doing this, is the ability to very easily deploy these sites on simple object stores without the need of provisioning virtual machines.
Enabling Amazon S3 Static Web Hosting
Amazon S3 is one of AWS’ oldest services and it has been able to host static website at the click of a button for as long as it has been around.
I’ll show you how to enable static web hosting on an existing bucket, the complexity of doing this at creation is the same but the UI might look slightly different. Head over to the bucket you want to enable static web hosting, select the Properties tab and scroll to the bottom of the screen. By default, this functionality is disabled so you should see something like this.
Go ahead and click the Edit button. Here you will be presented with a number of very straight forward options such as whether you want to enable the functionality and what type of hosting you want. By default, S3 will look for an index.html file at the root of your bucket but you can specify a different file in the Index document section. Pay attention to the information banner telling you that for this to work, not only you need to enable the functionality you also need make the content in your bucket public.
You can also configure redirection rules which we wont cover now, so scroll to the bottom and save your changes.
If everything went according to plan, you should now see it enabled and ready to be used.
You can also achieve this programmatically with CloudFormation (or any other IaC provider). Here’s what the YAML version of a CloudFormation template would look to achieve similar results.
S3 doesn’t let you make HTTPS requests to your site (you will be greeted by a nice 403). If this is a pre production environment that might be enough, however, the moment you start serving content to your customers you want to secure the traffic over the wire. You can get that functionality by pairing your newly hosted website on S3 with AWS’ CDN offering, CloudFront.
Two different endpoint formats are supported to access your content, let see if you can spot the difference.
Don’t feel bad if you haven’t. The only difference is the choice of “.” and “–” in between s3-website and the region in which your bucket lives.
CNAME and Custom Domains
Both CNAME and custom domains are supported. For CNAME, if you have a registered domain let’s say www.your-bucket-name.com you can create of bucket with that as its name and then create a DNS CNAME entry that points to www.your-bucket-name.com.s3-website.[region-placeholder].amazonaws.com. On the other hand, if you want a completely different name for your site that doesn’t match your bucket’s name, you can use your domain registered on Route53 by creating an Alias with your bucket’s information.