Back in the day, Microsoft SQL Server Tuning Wizard along with the SQL Server Profiler was the best way to track performance of SQL queries. In production, you might even add in custom perfmon metrics to the mix. But these days, Azure SQL has you covered with an extremely powerful query performance insights tool that does all of the heavy lifting for you.

Accessing Query Performance Insights

On an Azure SQL Database, simply access the Query Performance Insight tool under the Intelligent Performance sub-heading. Note that this is at the database level, not the server level. While some metrics (Such as DTU/CPU) can be tracked at the server level, when looking at individual queries, we have to look at each database individually.

From here, we can access :

  • Resource Consuming Queries – These are queries that cost the most resource (CPU, Data) as a *sum* of all queries. That means even if a query is performant, but is executed often, it may appear in this list.
  • Long Running Queries – These are queries that take the most time to execute, but again are the *sum* of all queries. So even if a query returns fast, if it’s called often, it will appear in this list.
  • Custom – This is where we can create custom reports to better drill down into poorly performing SQL queries. This is generally our best bet at finding bad queries.

Selecting any query allows you to view the actual query text :

As well as the average CPU, Data, Duration and execution count over the time period :

Importantly, there is also a chart below which allows you to track during hour intervals the same metrics. This can help you pinpoint certain times of day that may be more problematic for certian SQL queries :

Overall, utilizing this data can go a long way to giving you very simple metrics to act upon, all with very digestible queries, charts, and graphs.

The thing to note with all of these graphs, is that there isn’t one single metric that will be able to tell you the exact performance issues with your application. For example, a SQL query may run 100 times across 100 different users in your application, but is only non-performant on a single user (Maybe they have far more data than all the others). If you look at the average of all of these queries, it may look perfectly fine, whereas sorting by “max” may pinpoint that at times, this query is non performant.

Custom Queries To Utilize

Earlier, we talked a little bit about how using Custom queries were the best way to diagnose performance issues. Here’s some of the queries that I utilize to better understand the performance of my Azure SQL Databases, and what I’m looking for when running them.

Execution Count Metrics

I utilize the Execution Count metric to understand if there are additional caching needs for my application. A good example is if every page load requires you to return how many “unread notifications” a user has in your system. Or maybe every page load, we check the current logged in user in the database.

For the former (notifications), maybe we can cache this value so we don’t hit the database so often for something that isn’t *too* important. For example, if a user gets a notification, does their notification count really need to increase in real time, or is it OK to be cached every 30 seconds?

For the latter, sometimes there isn’t anything you can do. Checking whether someone’s JWT/Authentication Cookie corresponds with a valid user in the database is probably unavoidable.

But what I try to look for is outliers and things that really don’t need to happening in real time.

Duration/CPU Average

I utilize both CPU and Duration average to find queries that have the slowest average time of executing. But we need to be careful here, because sometimes the queries in these reports truly are slow, but are unavoidable. A good example might be generating an admin report that happens once per week. Sure, we could offload this to something better at number crunching, but if it’s getting ran once a week, it’s probably not a big issue.

The real gold finds are when we can take a query that appears on the slowest average duration and on the execution count report. This means not only is it one of the slowest queries overall, but it’s also getting executed often. Sometimes the “sum” query aggregation can help you here, but not always, so I often run the two independently.

Duration/CPU Max

Finally, I utilize the Duration and CPU max to find outliers in queries that may not on average be slow, but are slow under certain conditions. Often these can be a bit of a guess. When looking at a query within the Azure Portal, you won’t be able to see the query parameters. Therefore you can’t always know the exact conditions that caused the query to slow down, but often you can start making educated guesses, and from there do test scenarios locally.

Really, what you look for out of queries from this panel are queries you wouldn’t expect to be slow, but could under certain conditions be loading a lot of data. A good example might be a user on an ecommerce site who buys things regularly. They may have hundreds or even thousands of “orders” attached to their user, but the average user may only have a couple. Here we may see the query show up here due to the max duration being extremely long for that one customer, but not show up on the average report.

Azure SQL Performance Recommendations

Spend any time using Azure SQL and you’re going to run across it’s own “Performance recommendations” engine. These are performance recommendations (generally indexes), that Azure recommends periodically to improve your applications performance. Personally, I don’t utilize them that much, and here’s why :

  • Generally speaking, Azure Performance Recommendations mostly end up recommending you create indexes. While this can be helpful, for the most part if you are watching your slow running queries using the Query Performance Insights tool, you’re going to find them yourself anyway and probably have a better understanding of the actual issue.
  • The recommendation engine also can update your database behind the scenes without you having to lift a finger. This is bad. In most scenarios, you’re going to want to add that missing index in your own source control. It’s very rare that I accept a chance via this performance recommendation engine, and let Azure implement it for me.
  • The performance recommendations don’t take into business logic, or domain knowledge into account. There may be specific reasons why queries are acceptably slow, and/or it may only be slow in some use cases which you are happy with.

In general, I think that the performance recommendations are a helpful tool for any developer, but maybe not as automated as it appears on the surface. Generally, I’ve had to go away and validate it’s findings and then implement the changes myself rather than the one click tool.

I recently ran into an issue where I wanted to test out a couple of the new pieces of functionality that Microsoft Teams apps can do (Notably, things around creating custom tabs within Teams). To test this out, I figured the easiest way would be to create a free teams account under my personal Microsoft account (So, not using Office 365), so I could play around with various test applications. What I found was that it is extremely hard to follow any guide to upload custom sideloaded apps to a free teams account, but it is possible!

If you want to skip right to the end to “What does work”, then I will forgive you, however first I want to outlay what exactly doesn’t work, and why this took me so long to figure out!

What Doesn’t Work

When guides out there (including Microsoft’s own documentation) describe uploading custom apps to Microsoft Teams, they talk about using the custom app “App Studio”. This is essentially an app, within Teams, that allows you to upload your own custom apps. That’s maybe a bit confusing, but in simple terms, it allows you to build a manifest file, upload logos, set privacy page URL’s all within a WYSIWYG editor, instead of editing JSON manually.

Once you’ve filled out all options, you’ll hit this step to start distributing your application.

The first option you are going to try is “Install”. Makes sense to try and install it for testing right? Then you’re probably likely to get this :

Or in text form :

Permissions needed. Ask your IT admin to add XYZ to this team.

Interestingly.. I am the IT admin since I created this teams account. This will lead you on a wild goose chase, notably to find either the “Teams Admin Center” or the “Office 365 Enterprise Admin Portal”. The problem is.. You aren’t an Office 365 customer. If you follow any of these links you find on the web to enable side loading applications, you’ll pretty often get the following.

You can’t sign in here with a personal account. Use your work or school account instead.

Very. Very. Frustrating.

Knowing that I couldn’t get around this limitation. Instead I decided to select the option to “Publish” from this same screen within App Studio. It looked promising until I got to a screen that said my “IT Admin would review my application and approve it”. Well.. I’m the IT admin so I guess I should receive an email soon with a nice link to approve everything? Nope! Nothing.

Doing this seems to just send it out into the ether. I never saw any link, option, or email to approve this app. Another dead end.

What Did Work

Finally, I saw another poor soul with the same issue and the usual unhelpful advice of logging into your non-existent Office 365 admin account. Then someone left a nothing comment.

You can still just upload the custom app normally.

What did “normally” mean in this context? Well I went back to App Studio and this time around selected the option to download my app to a zip.

Then at the very bottom of the Apps screen inside Teams, I selected the option to “Upload a customised app” (Note, *not* “Submit to app catalogue”).

And by magic, after a long wait of the screen doing nothing, it worked!

So what’s going on here? At a guess. I have a feeling that Free Teams Accounts have the option to sideload apps into the account, but they have other restrictions that cause the “App Studio” to report that the IT Admin will need to enable settings. It’s essentially bombing out and blaming a setting that it shouldn’t!

But there you have it. If you need to sideload custom apps into Free Teams, you *can* do it, you just can’t do it via App Studio.

For a long time now, Azure QnA Maker has been a staple of any Microsoft Bot Framework integration. At it’s simplest, QnA Maker is an extremely easy to use key/value pair knowledgebase. Where an incoming chat is best matched with a question inside QnA and that answer returned. Unfortunately, it’s rather basic and for a while has been relegated to only answering questions in a one question to one answer fashion. Essentially, QnA Maker lacked the ability to “follow up” questions to better drill down to an answer.

As an example, imagine the following question and answer.

Question : Where can I park?

Answer : If you are in Seattle, then you have to park around the back of the building using code 1234. If you are on the San Francisco campus, then unfortunately you will have to park on the street. Usually there are parks available on Smith Street. 

While we have answered the user’s question, we had to combine two different answers, one for parking in Seattle, the other for San Francisco. Maybe we add another campus, or we want to elaborate further on a particular location, things can get confusing for the user fast. It would be much better if a user asks where they can park, the first response is asking where they are located.

Thankfully, QnA Maker have recently released “Follow Up Prompts” which allows a bot to have a “Multi-Turn” conversation to better drill down an answer. There are a couple of gotcha’s with the interface at the moment, but for the most part it’s rather simple. Let’s take our example from above and see how it works.

Adding Follow Up Prompts To QnA Maker

The first thing we need to do is head to our KB Editor at https://www.qnamaker.ai/. This interface is generally fine as-is, but this time around we actually want to add one additional column. Select View options and select “Show Context”. This won’t immediately be evident what this does, but is super important as we add Follow Up Prompts.

Next, I’ll add the question “Where can I park?” like so :

Notice how our “answer” is actually the follow up question. Also notice that “Add follow-up prompt”. Clicking it, we need to fill out the the resulting popup like so :

The options are as follows :

Display Text is what our follow up button text will show. In our case, because our drill down question is asking the user which campus they are located at, we want to display a simple option of “Seattle”.

Link to QnA will actually be the initial answer. So we can fill this out as to how it will be answered if a user selects Seattle.

Importantly, we select “Context-Only” as this enforces that the only way someone can reach this, is by following the prompts from parking. Otherwise, a user can simply type “Seattle” even without first asking about parking.

After hitting save, because earlier we turned on the option to “Show context”, we will be shown a tree view of our conversation flow :

Let’s Save and Train, then Test.

Perfect! And if we ask “Seattle” out of the blue, we also see that it doesn’t return our parking answer out of the blue!

We can of course go back and add other options to the original question as often as we want.

Linking Existing QnA

One final thing I want to mention is that if you have QnA options that are somewhat close to each other, and you want to link between them, you can now also use Follow Up Prompts to do this. Most notably, I created a QnA answer to handle bad answers. I then can add it as a follow up question by typing the start of the question “Bad Answer”, and selecting the existing QnA question.

Obviously this is a great way to have a common method for handling bad answers, but you can also use this as a way to show “Related” QnA within the QnA Maker, and not have to handle conversation flow within your bot at all!

This post will be a continuation of our post introducing the AWS Fault Injection Simulator.

The idea was to run an experiment and remediate our findings but as it turned out, the post was already too long with a simple setup so I split it in two parts.

I’d recommend you to check the first part to better understand the context of this entry, but the “tl;dr” is that we set up an experiment with FIS that would target for termination all EC2 instances of an application managed by Elastic Beanstalk. The beanstalk configuration has an autoscaling group with a minimum of 1 instance, which meant terminating it incurred on an outage.

The Remediation

On the application side, we need to make sure our environment runs on a minimum of 2 instances.

       

This is a good reminded that even though you make use of managed services, you’re still in charge of the behavior of it. Managed services (regardless of being compute, databases, containers, etc) will do all the heavy lifting but it will only operate in whichever way you tell it to. Our first FIS experiment showed that the application setup wasn’t resilient enough to failure. Whilst, beanstalk made sure to spin up a new instance to replace the terminated one, there was still a minute or two of downtime.

The New Experiment

Now that we’re running more instances, I’m also going to update the experiment template. On its current form, it would still target all instances because it was just based on tags which are shared by all ec2 resources managed by beanstalk.

The action can remain as is, that is a terminate-ec2 type. The change will be at the target level. Here, we need to update it in such a way that it targets a subset of the instances and FIS provides you with two options to do so.

  1. Count: Fixed number of resources that will be targeted by the matching criteria
  2. Percentage: Percentage of affected instances. NB: FIS will round down the resulting number of targets in cases where you have an odd number of resources.

I want to test how my application behaves if I lose half my fleet, so I’ll set it up with a Percent mode at 50%. In this particular case, this is the equivalent to choosing Count with a value of 1.

After running this new experiment, we can test our application and see that there are no perceived changes to it. However, upon closer inspection to our resources, we’ll learn a few things

  1. Our EC2 fleet downside to 1 (which means our action ran as intended)
  2. Beanstalk is showing a Degraded state because 1 of the instances stopped sending data. If you remember, our application state was Unknown when the entire fleet disappeared.

   

We now have a new configuration to withstand certain types of failure and an experiment we can run on a regular basis to make sure our application configuration is up to it.

There are many more types of actions you can perform with FIS that we can explore in future entries.

Chaos Engineering has been around for a while, after being popularized by Netflix during their migration to the cloud. However, despite their best efforts to open source their tooling, a proper secure and reliable set up was complicated enough most people.

Fast forward to the AWS announcement of a limited preview new managed chaos engineering service called AWS Fault Injection Simulator at re:Invent 2020. After a couple of months of limited access, the service is now GA (us-east-1 only at the time of this post) and today’s post is about getting started with it.

The Setup

There are a number of actions the service can perform (stop/terminate instances, throttle APIs, etc) against a number of different targets (EC2, ECS, RDS with more to come). For this entry, we’ll keep it simple and just focus on terminating a production EC2 instance experiment. In this particular case, I’ll be using the sample NodeJS application managed by Elastic Beanstalk.

The Application

As mentioned before, I’m just using the sample NodeJS application that Elastic Beanstalk offers you to quickly get started. However, I wanted to some of the configuration choices that I made to my environment.

The first bit of configuration (and one to pay attention to) is around the high availability for my environment. You’ll notice that while it is load balanced and scale up to 4 instances, the minimum has actually been set at 1.

You can also see the resources the service created for us, which in this case is one EC2 instance to which I’ve applied a resource tag at the application level. This tag is of the form chaos:ready, which it is descriptive enough for me to understand what instances I want FIS to target during its experiments. You could choose whatever value of the key value pair tag or just not have one altogether.

Finally, here’s what the sample application looks like and it also serves as a one to see how our environment is running.

Experiment Time

From the FIS homepage, you’ll see your option is to create a new experiment template so go ahead and hit that button.

Disclaimer: FIS will execute whatever actions you define against your resources. The service doesn’t produce fake metrics or wizardy to simulate how a potential disruption affects your system. The service will indeed, terminate your instances if that’s the action you have chosen. You will be provided with a number of warning signs along the way but it’s better to be safe than sorry.

Think of the template as the definition for your experiments, the place in which you can specify actions, targets and alarms on top of the usual name, role (the role requires a trust relationship on ‘fis.amazonaws.com’) and tags that we’re used to from other AWS services.  As previously mentioned, today’s experiment will only perform a terminate instance action.

When creating our action, we’re asked to provide a name for it as well as an action type from a predefined list. Once you’ve selected your action type, the Target dropdown will appear with an already prepopulated value created for you. The last option is something called “Start after“, what this means is in cases were a template has multiple actions, you might choose to run them in parallel or in sequence. Right now, it can be ignored given we’re going for the one action.

Now, let’s edit the target FIS created for us. I’ll start by updating the name for something a bit more descriptive, the Resource Type can stay as is because we’re indeed targeting EC2 instances. Now comes the fun part and arguably the area in which you need to focus the most which is how are we going to target these resources.

We see the selected method by default is using a resource ID. For our particular example, it might look like it’s enough and it indeed could be for a one off execution. It is true we’re only running one EC2 instance but we need to save the template with a fixed ID, so that means we’re not really in a position to reuse the template given that if we succeed and actually terminate the instance that particular ID will be lost.

So let’s use tags and filters and as soon as we select that method, a couple of “resource” options will appear. The first one is tags, and as you can imagine it will only run against resources with the specified tags. This will be the place in which I’ll use that chaos:ready tag from before.

The second option is called filters and I highly recommend you to follow the documentation link as this is the area where targets become truly powerful. For the sake of simplicity (this post is already too long) but not to leave you hanging, I’ll create one that targets only EC2 instances that are in a running state.

The Stop Condition section will provide you with the necessary safe guard to stop the experiment if a certain criteria is met. It is an optional value and I won’t be using it now but I’d suggest to always have one for serious experiments.

Go ahead finish the creation of the template. The service will make sure you’re sure about it with with a nice warning sign.

I’m now ready to start the template, which will in return create an experiment instance. The start process comes with the same warning as the creation one and it should run successfully.

Now that the experiment has finished, let’s have a look at the chaos it caused.

My beanstalk URL now returns an error, which means the underlying EC2 instance has been successfully terminated.

We can confirm our suspicions by looking at the health of our environment as well at the specific time in which it happened by looking at the metrics.

Beanstalk will automatically spin up a new instance and your environment will be back to healthy in a minute or two but it is a good reminded that even if you’re using a managed service, the service can only do what you tell it to do. In our case, because our minimum configuration was one instance, terminating it meant a complete disruption of our application.

In our follow up post with a way of mitigating that but still being able to run chaos experiments on our environments. Check it out here : https://tutorialsforcloud.com/2021/03/25/programmatic-chaos-with-aws-fault-injection-simulator-continued/

For some time now, Azure Cognitive Services has offered a “Text Analytics” feature, which can be used for finding topics within a piece of text, or even sentiment analysis to see if the overall sentiment of the text was positive or negative.

In early 2020, Azure released an additional feature to this API called “Opinion Mining”. Opinion mining is almost the cross between topic discovery and sentiment analysis. Instead of finding the overall sentiment of a piece of text, instead it finds the sentiment of individual topics. For example, in a piece of text such as :

The food here was terrible!

We would expect it to understand that not only is this a negative sentence, but specifically, we are talking negatively about the food. Being able to understand not just whether something is overall positive or negative, but also what is being talked about in that light can be invaluable in machine learning scenarios.

So let’s jump right in!

Setting Up Azure Cognitive Services For Testing

For the purposes of this article, we’re not going to get into individual SDK’s for Python, C#, Java, or any other language (Although these are available). Instead, we’re just going to use a simple Postman example of calling the API, with our key as a header, and retrieving results. This should be enough for us to see how the API works, and what sort of results we can get from it.

The first thing we need to do is head to our Cognitive Services account in the Azure Portal (Or go ahead and make one if you need to, the first 5000 requests are free so there is no immediate cost to creating the account!).

Under Keys and Endpoint, copy out your endpoint and one of your keys from this screen :

For our test, we are going to call a POST URL in the format of :

https://ABC.cognitiveservices.azure.com/text/analytics/v3.1-preview.3/sentiment?opinionmining=true

Where ABC is replaced with your cognitive endpoint taken from the above screenshot.

Additionally, we will sending a header of “Ocp-Apim-Subscription-Key” which will be our key, again taken from the screenshot above. In Postman it will end up looking like so :

The body of our request will always look like the following :

{
  "documents": [
  {
    "language": "en",
    "id": "1",
    "text": "Horrible location as it's right next to a construction site. But the food was amazing! Really friendly waiter too!"
  }]
}

Documents is actually an array because you can send multiple documents at once to the API to have them all mined at once. You still pay per document, so it isn’t a cost saver, but sending multiple documents at once can save time over sending them one by one.

Now we’re all set up, let’s get mining!

Testing Opinion Mining Out

First let’s try out a typical restaurant review :

Horrible location as it’s right next to a construction site. But the food was amazing! Really friendly waiter too!

So what we are looking for here is that it identifies that the location is negative, but that the food and waiter were positive. And what do you know (Note that the full API response is much more verbose, I’m just cutting it down to see what we need!)

{
  "sentiment": "negative",
  "confidenceScores": {
    "positive": 0.0,
    "negative": 1.0
  },
  "text": "location"
}
{
  "sentiment": "positive",
  "confidenceScores": {
    "positive": 1.0,
    "negative": 0.0
  },
  "text": "food"
}, 
{
  "sentiment": "positive",
  "confidenceScores": {
    "positive": 1.0,
    "negative": 0.0
  },
  "text": "waiter"
}

So as we can see it’s actually identified the noun that we are trying to describe, and whether our opinion was positive or negative.

Let’s try something slightly harder. What I noticed was that the opinion mining spotted the adjectives of “Horrible” and “Amazing” which should be fairly easy to spot. But how about this sentence :

I felt the food was bland. The music was also very loud so we couldn’t hear anything anyone said.

So again we are leaving a review, but specifically we are saying that the food is “bland” and the music was “loud”. There’s are very specific to the sentence and aren’t common adjectives you might use to describe something. But again :

{
  "sentiment": "negative",
  "confidenceScores": {
    "positive": 0.01,
    "negative": 0.99
  },
  "text": "food"
}
{
  "sentiment": "negative",
  "confidenceScores": {
    "positive": 0.04,
    "negative": 0.96
  },
  "text": "music"
}

And more importantly we see that it even picked up that the food being bland and the music being loud is why the opinion is negative.

"opinions": [
  {
    "sentiment": "negative",
    "confidenceScores": {
      "positive": 0.01,
      "negative": 0.99
    },
    "text": "bland",
  }
]

Really impressive stuff! Does that mean it always gets it right? Absolutely not. Using sentences with colloquial terms (For example, “The food here is the bees knees!”) just returns neutral scores, but for out of the box opinion mining with no training required at all (And very little developer legwork), opinion mining with Azure Cognitive Services is pretty impressive!

Not long ago, I wrote about “Creating MultiPart Uploads on S3” and the focus of the post was on the happy path without covering failed or aborted uploads. It was already long as it was so I decided to write a separate entry to discuss in detail how to clean up your buckets so you don’t incur in unnecessary storage costs.

What’s this all about?

Let’s review the basics: S3 allows you to store objects in exchange for a storage fee. Simple enough, however when we think of objects in the context of S3, most people assume the output of running a list-objects (or ls) operation or just looking at their buckets through the console (which performs the same API call). In said situations, parts of an object created through a multipart upload won’t show up but the service is still storing them for you which means you are paying for that storage.

If none of this surprises you, then this post might not be for you. However, if you’ve been doing multipart uploads for a while or you’re just new to it, I’d recommend to keep reading as you might find you could optimize your storage costs.

Let’s pick up where we left off

I’ll continue with the setup from our previous post, a bucket with a single 100MB file.

This is what list-objects has to say about it.

{
    "Contents": [
        {
            "Key": "large_file",
            "LastModified": "",
            "ETag": "",
            "Size": 104857600,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "",
                "ID": ""
            }
        }
    ]
}

So now, I’ll create a new multipart upload (I’ll be reusing the same file) but to simulate failure or an aborted operation, only the first part will be uploaded.

Let’s have a look at what list-objects has to say about it now.

{
    "Contents": [
        {
            "Key": "large_file",
            "LastModified": "",
            "ETag": "",
            "Size": 104857600,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "",
                "ID": ""
            }
        }
    ]
}

It is the same output as before, however if we list-parts for this particular upload we can see how we’re using an extra 25MB from our first part.

> aws s3api list-parts –bucket your-bucket-name –key your_large_file –upload-id UploadId

{
    "Parts": [
        {
            "PartNumber": 1,
            "LastModified": "",
            "ETag": "",
            "Size": 26214400
        }
    ],
    "Initiator": {
        "ID": "",
        "DisplayName": ""
    },
    "Owner": {
        "DisplayName": "",
        "ID": ""
    },
    "StorageClass": "STANDARD"
}

As far as I’m aware, the only native way (as in not wrangling scripts or 3rd party tools) to get the entire size of the bucket is through CloudWatch metrics. You can see how the total size of my bucket is correctly represented at 125MB.

So where do we go from here? Deleting unneeded parts sounds like the path forward.

S3 provides you with an API to abort multipart uploads and this is probably the go-to approach when you know an upload failed and have access to the required information to abort it.

The command to execute in this situation looks something like this

> aws s3api abort-multipart-upload –bucket your-bucket-name –key your_large_file –upload-id UploadId

However, this is not a very scalable way of controlling orphan parts, across multiple uploads and buckets. You could craft a couple of scripts (using the list-multipart-uploads command) that run on a schedule to check for those file or you can setup a lifecycle policy on your buckets to clean failed uploads.

Luckily for us, S3 makes this easy to set up. Head onto the management settings for your bucket and create a new Lifecycle Rule.

First of all give it a name and then define what the scope of the policy will be. Your options are to apply to the entire bucket or a specific prefix (for example “/uploads”). In my case, I’ll set it up across the entire bucket and the service will rightfully lets me know about it.

Next up is defining what do we want this rule to do. As you can see, there’s already a predefined option for incomplete multipart uploads.

 

And finally, configure the parameters for this action. Remember, S3 doesn’t know if you upload failed which is why the wording (and behavior!) is around incomplete uploads. As such, it is entirely up to you how soon after they were created you want to delete parts.

A very common query I get when storing files in Azure Cloud, is “Why are we using Blob Storage instead of File Storage. After all, aren’t we storing files?”. And it’s actually a pretty good question. And luckily, it has a very simple answer.

When To Use Azure File Storage

Azure File Storage is specifically used when storing files to be used like a managed file share. For example, if you are currently using a network share within your company on an old PC sitting under someone’s desk, you can move these files to the cloud using Azure File Storage, and have it act exactly the same as your current networked file share. Importantly, it supports both “Server Message Block (SMB)” and “Network File System (NFS)” protocols, so can be used across Windows, Mac and Linux operating systems.

While a company wide network share is obviously a good use case, another very common example is when you have an existing application (Such as a Windows Service) that you simply lift and shift onto a VM in Azure. If this application requires the use of a network share, instead of having to create a tunnel back into your office network, you can lift and shift the network share into Azure File Storage. Meaning minimal code rewrites, and making it a true lift and shift approach.

When To Use Azure Blob Storage

Azure Blob Storage is best used when storing unstructured or binary data in the cloud, and you don’t need access to it via Windows Explorer or other SMB protocols. Realistically, this means if you are storing files for your application, that are then read back via that same application, Azure Blob Storage will suffice.

It should be noted that there are Windows applications and addons that will make a blob storage account act like a file share, but it’s not recommended as some features that are available on Azure File Storage are not available on Blob and vice versa. If your main use case for moving files into Azure is to have them act as a network file share, you should use Azure File Storage instead of Blob.

File vs Blob Pricing

The other very important thing to note is that there are pricing differences between Azure File Storage and Azure Blob Storage. Sometimes it can be in the cents per GB, but often the transaction costs are vastly different on the File Storage side. For example write operations will cost you 30% more on Azure File Storage.

While it does pay to check pricing, your use case should dictate which option you go for rather than any cost difference.

When you’re using S3, an object store that has unlimited volume of data and a maximum object size of up to 5TB (the maximum for a single PUT request is 5GB) you might be tempted to start uploaded some pretty big files.

So today’s focus is about making use of the multipart upload capabilities of S3 to speed up the amount of time that it takes for a large object to land on your buckets.

The “managed” way

The AWS CLI has a number of commands that will help you upload those large files by automatically making use of multipart and so chances are that if you have used the CLI to upload documents into your buckets you have come across them. Those commands are cpmv and sync and they can be used as followed.

> aws s3 cp your_large_file s3://your-bucket/

> aws s3 mv your_large_file s3://your-bucket/

> aws s3 sync your_large_file s3://your-bucket/

The differences between the three is out of scope for this post, however I’ll finish by saying that you can still change their configuration in order to make better use of your bandwidth. You can set the new configuration values through the CLI or directly into your AWS profile. A list of all possible configuration values can be found here.

The “unmanaged” way

AWS will recommend you to use those commands when possible (and with good reason!) but there are cases in which they don’t fit the bill and you have to do a bit of plumbing yourself. Luckily you are not left alone and the AWS CLI still provide you with the necessary commands to achieve the same result.

So let’s go ahead and upload a large file in parts into our bucket. In my case, I’ll create a 100Mb test file from the command line like this

> truncate -s 100M large_file

Now, I’ll use the split command to get four 25M parts. Split is available on both Linux and OSX (however, the OSX version might out of date and you might need to install the GNU core utilities).

> split -b 25M large_file

If you list the files in your directory, it should look something like this

 

 

 

 

We are not ready to start interacting with S3!

The first step in the process is to actually create a multipart upload

> aws s3api create-multipart-upload –bucket your-bucket-name –key your_file_name

The response from the API only contains three values, two of which have been provided by you. The last value is the UploadId and as you can imagine, this will be our reference to this multipart upload operation so go ahead and save it.

It is time to start uploading our part. The following is the command on how to upload a single part of which you’ll have to repeat N number of times depending on how many parts you’ve split your file into (In my case, N=4 and the command is for the first part), the values for part-number and body will need to be updated accordingly for every part you upload.

> aws s3api upload-part –bucket your-bucket-name –key your_file_name –part-number 1 –body xaa –upload-id UploadId

The ETag value that each upload-part returns will be used to complete the upload.

Once all parts are uploaded, you need to instruct S3 that the upload is completed. Remember S3 has no knowledge on how many parts there should be and what the references are so, passing that information back to it will complete the process. In order to do so, we need to compile a json array of all our parts and their respective ETag values.

You can use the ETag values that you have been collecting or retrieve them again by listing all parts in the upload

> aws s3api list-parts –bucket your-bucket-name –key your_file_name –upload-id UploadId

Save the output of the “Parts” array into a new file (I’ll call mine parts.json) and make sure to not include the LastModified and Size keys into the final file. Once you’re done the file should like something like this and remember that in my case, I was only dealing with four parts.

{
  "Parts": [
    {
      "PartNumber": 1,
      "ETag": ""
    },
    {
      "PartNumber": 2,
      "ETag": ""
    },
    {
      "PartNumber": 3,
      "ETag": ""
    },
    {
      "PartNumber": 4,
      "ETag": ""
    }
  ]
}

Now let’s use that to complete the upload with one final API call.

> aws s3api complete-multipart-upload –multipart-upload file://parts.json –bucket your-bucket-name –key your_file_name –upload-id UploadId

And we’re done, the response will contain the location for your newly uploaded file. We can call the list objects API or check the console if we wanted to double check our file is there.

While many things in Azure have straight forward “Spin this up, pay this per hour” type pricing models, Azure SQL is not one of them! While it does have the option of paying per hour, per database, per machine size, that’s only one of many ways to use Azure SQL. So I thought it would be worth talking through how pricing works with Azure SQL, and hopefully make it a little simpler to find the right option for you.

Before we get started, I just want to note that when I say Azure SQL, I am referring specifically to Microsoft SQL Server in the cloud. Things like Postgres on Azure are named “Postgres for Azure SQL”, but if you see Azure SQL on it’s lonesome, it means that it’s referring to SQL Server. Easy!

With that out of the way, let’s get started!

Single Database vs Elastic Pool

The first decision you are going to have to make is whether you are going to use a Single Database (Or many Single Databases), or use an Elastic Pool.

Single Database is exactly how it sounds, it’s a price per single database you spin up. It’s important to note this is *not* a single server, but a single database. So if your application uses two databases, for example one for transactional data and another just used for logging, then you will pay for two different AzureSQL databases. The benefit however is that each database has it’s own resources dedicated to it, and therefore they are isolated from one another. A downside is that if your application uses multiple databases (For example a single tenant SAAS application that uses a database per customer), then your costs are going to sky rocket.

Elastic Pools are a collection of SQL databases that share computing power, and pay for a “pool” of resources. Elastic Pools do start with higher pricing than Single Databases (e.g. The minimum spend is much larger than that of a single database), but if you have a data model that requires spinning up multiple databases (And possibly spinning them down), then Elastic Pools are for you. I would note that Elastic Pools also have other factors to consider (e.g. Max DTU sizes), and the shared resources can sometimes be more of a hindrance than a help. For that reason, I only recommend using Elastic Pools when you truly do have a “pool” of databases, like that of a single tenant SAAS application, and to not use Elastic Pools to save a few dollars on hosting costs for your 3 databases in production.

DTU Pricing Model

DTU stands for “Database Transaction Unit”. It’s taking measures of CPU, Memory and IO and combining them into a single metric. That makes it hard to talk about because the first question I usually get fired back when talking about DTU’s is “So how many CPU’s is that? How much memory?”. And the answer is… We don’t know. Or more so, because it’s a blended metric, 1DTU could be comprised of almost all memory and very little CPU, or it could be completely vice versa!

That’s actually one of the benefits of DTU. It’s a single “processing power” metric without having to juggle exact memory or CPU sizes. If you’ve ever had to grab a VM that has a huge amount of memory, but very little CPU, and it’s left you saying “Well.. I just want to increase the CPU, but not the memory, but the next VM class up doubles the memory!”, then that’s why DTUs are in some ways very powerful.

However, clearly a blended metric hides exactly what you are purchasing and for some people that’s a deal breaker. It makes it hard to understand initial provisioning sizes because at first, you will have nothing to compare it to. However, vertical scaling is absolutely no issue with Azure SQL, and so starting low and working your way up is always an option.

vCore Pricing Model

As an alternative to DTU pricing, you can still purchase Azure SQL using the vCore Pricing Model. vCore is your standard Azure SQL on hardware pricing where you know exactly how many CPU Cores and Memory you are being given. It’s great if you know exactly the computing power you need, or prefer the transparency of resourcing over the DTU pricing model.

Under vCore, there is actually two additional options. There is a price per core model, that is great for unpredictable workloads that may need to scale multiple times per day. Under this model, you simply pay per CPU core, per hour. And that’s it!

As an alternative, there is a “standard” set of machines available that are essentially built into your standard “tier” sizes. e.g.  2 Core 10GB, 4 Core 20GB vCore machines. These are great if you know the computing power you need and it won’t need to scale vertically that often.

DTU vs vCore

Unfortunately, after reading all of this you may come to the conclusion that you want to use vCore for it’s transparency, so that you know exactly what you’re getting. And Microsoft knows it, that’s why they’ve put the minimum provisioned vCore Azure SQL prices at around $400 USD per month (depending on region)! There is no lightweight entry into using the vCore pricing model, it’s almost an all or nothing approach.

On the DTU side of things, pricing can start for as little as $15 USD per month (depending on region), and the price step ups are much more granular, making it a much more viable solution for small start-ups and small businesses that just need a single database in the cloud.

Other options include using a DTU pricing model for Dev/Test workloads, and using a vCore model for Production. Again, this works great but only if you are happy with the minimum spend per month for (possibly) far more computing power than you need.

In the end, DTU vs vCore is less about pricing models and how resources are allocated, and more about the minimum level of pricing. In the majority of cases, DTU pricing is the way to go simply so you can start smaller, and ramp up over time.