How new automatic scaling works on Azure App Service

Or choosing between Automatic scaling & Metric Scale and understanding behavior of the new one.

Published in

Microsoft Azure

6 min readDec 11, 2023

TL;DR; This article is about the new Automatic scaling option for the Azure App Service application. It is pretty different from the old one based on hard metrics, and you can experience and, in fact, represent Azure Functions in the HTTP scaler. The complexity of your application can directly impact scaler performance and service bill when scaling is used. Read this article to read about lessons learned and to find answers to your questions. Article have supporting video.
Visit Festive Tech Calendar 2023 for more content.

Video version of this article

The encounter

I had an assignment a few months ago related to the strange behavior of memory-intensive applications on Azure App Service, delays in the scale-in process, and deployment slot rotation problems.

While documentation is usually good, behavior in the wild can be different, so I started the investigation and created load tests, enabling additional configuration for application traces. Because, at the moment, it was impossible to tell how many instances were working. Application Insights was showing the wrong number of active instances g, which was scary for a customer because of the possible high bills.

But let’s look into the basic details of metrics and automatic scalers.

The old Autoscale — Metric scaling(name in Azure Portal)

Can scale-out and scale-up, no maximum limit
Works on top MS web servers using Windows/Linux metrics IIS/Kestrel
Based on VMSS Scale sets
Use metrics like CPU, memory, and IO data bytes.
Depends on your decisions, but there are general recommendations like scale-out on 80% CPU and so on.
Your decisions depend on the load and smoke tests that you should do with the prod environment. There is no magic, so the exact combo of VMSS and VM tier should be tested.
Do not require the same number of VM instances deployed for blue-green deployment and slot rotation.

So, if you know your application processor and memory footprint, then it is the best way out there, or is it? Scale-out with metrics is quite predictable but can also be slow sometimes.

The new Automatic scaler

Can scale-out only with a maximum of 30 instances
Based only on HTTP requests(!)
Works only with expensive Premium V2 and Premium V3 machines
It is an Azure Functions serverless HTTP scaler version
Configurable pre-warmed instances

You have an application with the need for powerful machines to run, but you do not know the exact metrics to use for scale — this sounds like a strange combination. But on the other hand, you want a faster scale-out and scale-in to save some money.

The functions(easy) way

So the scaler is essentially a functions scaler adopted for the App Service, and I would guess that it will become the default in the coming 2024(hopefully for Standard tiers too) because metrics scaler require an understanding of how application impacts memory, CPU, and IOPS and some observations and fine-tuning.

As you can see, UI has the same visual configuration as Azure Premium Functions but with much more powerful instances behind it.

So you can configure it on two levels.

With App Service plan

And on an application level

The maximum burst is a shared setting for the app service plan and can be configured from any application blade, but the scale limit is an application setting. So you can fine-tune several applications that use the same compute of App Service plan; one can have a 1–5 scale range, and another 1–24, for example.

These limits can help a lot if you are doing blue-green deployments because you will need an exact number of instances up and running for the Staging slot in order to not lose client sessions during rotation.

Monitoring

As I mentioned before, you will need to drill down into Log Analytics with Kusto queries to get a proper understanding. While live telemetry with Applications insights usually works very well, you might need more details.

Another thing is you should look into Azure Functions telemetry documentation if you feel that more than Automatic scaling documentation is needed :).

Detailed information about virtual machine starts is available from traces, which you must enable in your Azure Web App application with the following key/value pair.

SCALE_CONTROLLER_LOGGING_ENABLED = AppInsights:Verbose

To see machines that is working right now (emitting metrics)

performanceCounters 
| where timestamp > ago(30m) 
| summarize count() by cloud_RoleInstance

The sample output with cloud Instance names

To see machines that was spanned during the day

traces 
| where timestamp > ago(1d) 
| summarize count() by cloud_RoleInstance

To see machines that were spanned during the day

requests
| where timestamp > ago(1d) 
| summarize count() by cloud_RoleInstance, bin(timestamp, 1m)
| render timechart

Filtering out traces of the particular instance

traces 
| where timestamp > ago(1d) and cloud_RoleInstance == 'pd1mdwk000C4P'

To show the performance of a particular instance

performanceCounters 
| where timestamp > ago(1d) and cloud_RoleInstance == 'pd1mdwk000C4P'

Testing

Things are getting more interesting; as you saw in a previous screenshot, I’m using P2V2 instances. So, how does this HTTP scaler behave under pressure?

The starting point, 2 default instances are configured to run.

**This is minimal .NET razor pages app**

Now I’m starting with Azure Load testing run 165k requests over 1 minute :)

Now things are ramping-up fast, and CPU metrics are getting out of hand :).

And now, new instances have appeared in Application insights.

I am also getting some Azure internal exceptions during the scale; it looks like they are related to the start of a specific instance and do not impact my application’s health or usability.

So, let’s have a look at the test results. I did at least 10 of them at different times, and the results were more or less the same.

As you can see, automatic scaling can reliably scale out your application. This case spikes a bit extreme for any app with 156k over 1 minute, and it is done to demonstrate an extreme scenario.

Things will get trickier with memory-intensive applications because some sessions can experience performance degradation as a result of the delay between metrics and scaler response.

The conclusion

We should see a general availability of Automatic scaling in 2024, and it will find a niche of serverless-like experience for resource-hungry applications that can be run without Kubernetes or apps with huge spike loads that need to be done in a matter of seconds.

But I guess this type of scaler would be an even better fit for a Standard performance tier of App Service plans.

As an engineer, you should always strive to understand how your application performs in production in terms of CPU/Memory/IO so you can have a proper metric scaling with a safety margin. I.e., scale by CPU metric at 70%, not at 90%.