Efficiency, Utilization and Latency Engineering

Compute resources are being used ‘well’ when there is a proper balance between efficiency, utilization and latency.  At first glance you would say that efficiency and utilization are the same and latency is a bi-product.  Lets  clarify the difference between efficiency and utilization.

Ask your self if it is possible to have high efficiency and low utilization or low efficiency and high utilization.  It becomes clear that there are use-cases where a compute system can be using all of its resources (high utilization) but produce little or no value (low efficiency).  It is also true that a compute system may be running at a fraction of its capacity (low utilization) and be producing large amounts of work for the capacity consumed (high efficiency).  In both cases compute capacity, power, carbon emissions and financial’s are negatively impacted.  Efficiency and utilization are not the same, but more on that later.

Latency is a bi-product of both efficiency and utilization.  A well designed highly efficiency piece of software running in a service across multiple machines can provide very low latency.  If run at a constant work load, it is possible to optimize the compute assets required and maintain a very high utilization.  However, that same software service,  if bombarded with requests, may not have sufficient compute resources or properly designed scalable software to maintain the low latency required.  The service will go into saturation and instabilities will cause system efficiency to drop while utilization remains saturated.  Latencies will be impacted and the service will severely impact our customers

It’s clear that efficiency can change independent of utilization.  Lets take a moment to identify the quantitative difference between efficiency and utilization:

  • efficiency is the ratio of work performed to resources consumed.  E=W/Rc
  • utilization is the ratio of resources consumed to resources held.  U=Rc/Rh

From these equations we see that utilization has nothing to do with how much work is being performed.  In the previous example we see that when a service goes into saturation, efficiency will drop while utilization stays high.  We could also have the opposite case, infinite resources available would yeild near zero utilization while efficiency would not be impacted (the same amout of work is performed while consuming the same amount of resources).

We can define a new metric, effectiveness, which  represents the combination of efficiency, utilization and desired latency where an effectiveness of 1 is the target. EF = f(e,u,l) where e is an efficiency is some normalized  value (usually a monetary value) , utilization is a percentage (and needs to be a denominator that will inflate the efficiency value)  and l is response time in msec(another denominiator value that will inflate the efficiency value).  EF then becomes a ratio of efficiency to latency with a utilization inflation number. (normalize latency to 1 where 1 is the desired value, this will cause effectiveness to increase as latency decreases while utilization will inflate if resources are not released)

Here we have a use-case where there is high efficiency, high utilization but poor (high) latency.  In this use case, we have a positive impact on compute capacity, power and carbon emissions however, we will have a negative impact on financial’s as our customers will be waiting for our software to respond.

Excellent Engineering requires that there be a well understood balance between efficiency, utilization and latency.

Leave a Reply