Latency

Bandwidth limitations are only one consideration for the efficiency of a network. The next most common limitation for which engineers must design, and that most users are at least intuitively familiar, is latency. Put simply, latency is the time between the initial moment a signal is sent, and the first moment a response to that signal can be initiated. It's the delay of a network.

There are two ways to think about latency. Simply put, you can measure it as one-way, or round-trip. Obviously, one-way latency describes the delay from the moment a signal is sent from one device, to the time it is received by the target device. Alternately, round-trip latency describes the delay between the moment a signal is sent from a device, and the moment a response from the target is received by that same device.

One thing to note, however, is that round-trip latency actually excludes the amount of time the recipient spends processing the initial signal before sending a response. For example, if I send a request from my software to an external API to provide some calculations on a piece of input data, I should reasonably expect that software to take some non-trivial amount of time to process my request. So, imagine first that the request spends 0.005 seconds in transit. Then, once received, the request is processed by the API in 0.1 seconds. Finally, the response itself spends another 0.01 seconds in transit back to my software. The total amount of time between my software sending the request and getting a response is 0.005 + 0.1 + 0.01 = 0.115 seconds. However, since 0.1 seconds was spent processing, we will ignore this when measuring round-trip latency, so the round-trip latency will be measured as 0.115 - 0.1 = 0.015 seconds total.

It's not uncommon for a software platform to provide a service that simply echoes the request it was sent without any processing applied in response. This is typically called a ping service, and is used to provide a useful measurement of the current round-trip latency for network requests between two devices. For this reason, latency is commonly called ping. There are a number of factors that confound the reliability of a ping request in any given scenario, so the response times for such requests are not generally considered accurate. However, the measurements any ping service provides are typically considered to be approximate for a given network round-trip, and can be used to help isolate other latency issues with a given request pipeline.

As I'm sure you can imagine, a constraint as generically defined as a network delay can have any number of contributing factors to its impact on network performance. This delay could come from just about any point in the network transaction, or on any piece of software or hardware in between the originating and target devices. On a given packet-switched network, there may be dozens of intermediary routers and gateways receiving and forwarding your package for any single request. Each of these devices could introduce some delay that will be nearly impossible to isolate when performance monitoring or testing. And, if a given gateway is processing hundreds of simultaneous requests, you could experience delays just by virtue of being queued up behind a number of requests that you had nothing to do with and of which you might have no direct knowledge.