Formulus Black Blog

Compute Performance – Distance of Data as a Measure of Latency

How is your “compute” performance? Have you ever found yourself saying, “Wow, finally, my computer is as fast as I will ever want it to be?” Not likely. The simple truth is, computer performance tends to elicit a subjective response to a mathematical action. “As fast as I will ever want it to be” could only be true if there was zero wait time for a transaction to process, an impossibility when the speed-of-light is a hard limit. Short of that, the faster the performance, the happier we are, but we will never be done wishing for more.

Over the years, technology has vastly improved the performance of individual components. Processors have increased by leaps and bounds, with faster cycle speeds, caching, multiple cores, as well as software that enables processing in parallel over thousands of separate units.

Memory technology has also improved in cycle and bus speed, along with overall size allowing for main system memory (random access memory, RAM) to become ever larger. Along with these changes, storage technology continues to improve. Hard Disk Drives (HDD) have yielded much ground to Serial Advanced Technology Attachment (SATA) as well as non-volatile memory express (NVMe)-connected Solid State Drives, increasing drive I/O, and vastly decreasing latency between storage and RAM.

Still, despite all these improvements, hunting for acceptable performance continues to occupy much of the time of those in IT (you can verify that with your friendly and frazzled local database administrator). Why, despite all of these enhancements, are we still on the hunt for ever better performance?

The answer is more complicated than simplistic, but there is a fundamental (and literal) bus-bridge we all must cross that consistently throttles all of our endeavors to enhance speed. To better understand this, it helps to first level-set by reviewing, at a high level the data process path because we all tend to lose sight of the basics in the bright light of new technologies.

The central processing unit (CPU) of your typical computer orchestrates the math that produces the results we are seeking. Components in the processor receive data from relatively small holding areas, called registers. CPU registers are fed data from multi-tier caches (L1, L2, & L3) that reside on the processor. Data stored in cache memory has a lower latency time (partly because it is physically closer) to the processor than that contained in RAM or peripheral storage. What data resides on which tier of cache depends on its frequency-of-need by the processor.

The registers can also access RAM if they do not find the needed data in one of the cache levels. Data sitting on peripheral storage devices must be loaded into the RAM to be used by the processor. If the RAM size is insufficient for all needed data, data swaps occur between RAM and slower, but larger, peripheral storage devices, such as solid-state drives (SSD) and hard-disk drives (HDD). Modern peripheral SSD storage connects via SATA, or more recently, NVMe, a transport protocol that rides over PCIe (Peripheral Component Interconnect express).

What does all of this mean? Ultimately, the highest possible performance potential of a computer is data sitting at the CPU. When the CPU has to go down-stream to get needed data, latency delay causes performance to fall off from its maximum. The further the required information is from the processor the longer the wait due to limits of the speed of travel of the electrical current.

The latency of data travel time forces us to face two issues:

  1. To be processed, data must be in RAM at a minimum for any work to get done.
  2. Data stored on external storage, no matter how fast that storage, is limited by the distance and maximum speed of travel. Compared to RAM, external storage is “cold” storage.

Distance of travel? Really? On a computer? How “far” can the data be?

From a human perspective, computers do mathematical calculations very quickly, but it is notable that computers “perceive” time differently than humans. What might be “quick” to a human, is often fundamentally different from a computer. To illustrate, in the movie “Star Trek: First Contact,” there was a conversation between Commander Data and Captain Picard after the Borg Queen tried to change Data from an android into a human.

Data: “She brought me closer to humanity than I ever thought possible, and for a time, I was tempted by her offer.”
Picard: “How long a time?”
Data: “0.68 seconds sir. For an android, that is nearly an eternity…”

To put computer perspective into a more human consumable form, consider what would happen if a single 3Ghz CPU clock cycle is “normalized” from .3 nanoseconds (ns) to “one second” and notice its effects on data retrieval delay.

Table 1:

Table showing average latency versus normal human time

Another way of looking at latency and understanding why it makes such a difference where your data resides is when we measure distance as a measure of time. A nanosecond is how long it takes light to travel approximately one foot in a vacuum. Assuming that electricity could achieve this rate of speed over copper, notice how far the data would have to travel to reach the processor, based on latency.

Table 2:

Table showing average latency and effective distance from a processor at the speed of light

As these two tables demonstrate, you can speed up your processors, peripherals, and buses, but ultimately it is the travel distance of data from the processor, as indicated by its latency, that has, and will continue to have, an immutable effect on performance. In other words, even if the response latency of a storage device was zero, universal physics limits speed over distance. What then is the answer? Data must never be further away from the processor than and persist in, RAM for optimal performance in x86 architectures. Unless and until that happens, computer performance will remain relatively static with ever more powerful and expensive processors spending more and more time idle, waiting on the data to load from “cold” storage to RAM. The technology and know-how to accomplish this is out there, end users should demand their vendor of choice provide it or use a vendor that will.

Formulus Black’s revolutionary operating system, ForsaOS, allows any workload to run in memory. To see how it works, schedule a demo with us.

Inspired by an original idea in Systems Performance: Enterprise and the Cloud 1st Edition by Brendan Gregg. and Jim Gray’s, How Far Away Is Your Data? 

John Poli

Director of Quality Assurance

John Poli is Director of Quality Assurance at Formulus Black, where he manages, leads, trains, and motivates a team of engineers to deliver quality products in a fast-paced work environment.