Scaling: Where to?
Pushed by some friends – and at a fortunately discounted price – I finally gave in and made it to an AWS exam; so call me certified now. Just so.
However, this has nothing to do with the matter in discussion, here: While awaiting the exam, I talked to some fellow cloud geeks and was surprised again by them confusing up and down and out and in again – as experienced so many times before; so::: this is about “Scaling”!
And it’s really simple – here’s the boring piece:
The Prinicple
Both scaling patterns (up/down and out/in) in essence serve the same purpose and act by the same principle: Upon explicit demand or implicit recognition the amount of compute/memory resources are increased or decreased. Whether this is done
- proactively or reactively (see guidelines above)
- automatically
- or manually through a portal by authorized personal
is subject to the framework implementation possibilities and respective architecture decisions.
What’s up and down?
For the scale-up/-down pattern, the hypervisor (or IaaS service layer) managing cloud compute resources has to provide the ability to dynamically (expectedly without outage; but that’s not granted) increase the compute and memory resources of a single machine instance.
The trigger for scaling execution can either be implemented within a dedicated cloud automation engine, within the Scalability building block or as part of a self-service portal command, depending on the intended flexibility.
What’s in and out?
The same principles apply as for the scale-up/-down pattern; however on scaling out an additional instance is created for the same service. This may involve the following alternatives:
- Create an instance and re-configure the service to route requests to the new instance additionally
- Create an instance, re-configure the service to route requests to the newly created instance only and de-provision an existing instance with lower capacity accordingly
Both cases possibly demand for (automated) loadbalancer reconfiguration and for the capability of the application to deal with changing server resources.
Respectively, scale-in means to de-provision instances once load parameters have sufficiently decreased. An application on top has to be able to deal with dynamically de-provisioned server instances. In a scenario where the application’s data layer is involved in the scaling process (i.e. a DBMS is part of the server to be de-provisioned) measures have to be taken by the application to reliably persist data before shutdown of the respective resource.
And now – for the more funny part
It occured to me that two silly graphics could ease to remember the difference, hence I do invite you all to think of a memory IC as a little bug climbing up a ladder and in turn of computers bursting out a data center. Does that make it easier to distinguish the two patterns?
You think you don’t need to care?
Well – as a SaaS consumer you’re right: As long as your tenant scales and repsonds accurately to any performance demand – no worries. But as soon as things deviate from this, one’s in trouble in terms of finding the right application architecture when it is unclear whether you’re to scale resources or instances. So — remember the bug and the fleeing servers 🙂
(feature image by Alexander Zvir from pexels)